# Getting Python Setup

We will be working with Python3 (latest version python). If you do not have python already installed on your computer, it is recommended to install it with Anaconda

For windows users follow [this guide](https://medium.com/@GalarnykMichael/install-python-on-windows-anaconda-c63c7c3d1444)

For MacOS follow [this guide](https://medium.com/@HarryWang/how-to-setup-mac-for-python-development-37e5fd895151)

# Jupyter notebook

Interactive python environment for rapid development

Supports markdown syntax, allowing for robust **text formatting** like `inline code`

In [None]:
## Also acts as an editor and compiler 
print("Hello World")

__[Cheat sheet on Markdown syntax](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)__

# Programming Basics

Google is your best friend. If you have any questions, try googling "python *insert topic here*"

In general, [w3schools.com](https://www.w3schools.com/python/) and [geekforgeeks.com](https://www.geeksforgeeks.org/python-programming-language/) are reliable sources for reference

## Declaring Variables

In [1]:
x = 5
y = 7 

In [2]:
x + y

12

In [3]:
x += 5 

In [4]:
x

10

Multiple types of data beyond integers

In [None]:
string_type = "list of characters"

In [None]:
boolean_type = True

In [None]:
float_type = 4.56 

Important to know what the **type of your data** is because some expressions can't be evaluated with the wrong data type

In [None]:
string_type + float_type

## Data Structures

data structures are used to hold multiple values in one place

**Lists** can hold multiple values of any data type

In [None]:
example_list_of_ints = [14,64,23,60,40]

In [None]:
len(example_list_of_strings) ## Get the length of a list

Access value at a specific index of a list with brackets

In [None]:
example_list_of_strings[2] ## index starts at 0

**Dictionary** is a list of key/value pairs. Every key is unique and is used to locate the value

[more info on python dictionaries](https://www.w3schools.com/python/python_dictionaries.asp)

In [None]:
dict_quest_TAs = {'190':['Derek','Matt'],'390':['Michael'],'490':['Allie']}

What is this output going to be?

In [None]:
dict_quest_TAs[190]

## If Statements

expression inside the parenthesis is evaluated as a boolean 

In [None]:
x = 10

In [None]:
if (x == 10):
    x += 5
    print("x is " + str(x)) ## str(x) to convert int to string 

if...else statements can be used to control execution 

In [None]:
if ('395' in dict_quest_TAs):   ## "in" operator checks if a value is in the keyset of a dictionary 
    print(dict_quest_TAs['395'])
else:
    print("395 is not a QUEST course")

if..elif..else statements 

In [None]:
y = 7

In [None]:
if (y < 8):
    print("y < 8")
elif (y == 8):
    print("y is 8")
else:
    print("y > 8")

## For Loops & While Loops

loops can be used to execute a segment of code multiple times

In [None]:
random_numbers_list = [1,4,40,23,100,24,55]

`for loops` are good for executing specific number of times

In [None]:
for i in random_numbers_list:
    if i > 20:
        print(i)

if you don't know how many times to run the code, then use a `while loop`

In [None]:
## print all values in random_numbers_list up until the first value that is >= 10
i = 0
while (random_numbers_list[i] < 10):
    print(random_numbers_list[i])
    i += 1

## Methods

methods are used to encapsulate a chunk of code 

makes for cleaner code and users don't have to see the implementation

In [None]:
## write a function to do x^y

def powerFunction(x: int, y: int):
    answer = 1
    for i in range(y):
        answer*=x
    return answer

# Data Analysis in Python

**Congrats!** you now know all the programming basics to write your own Python program

Lets learn about some open source Python packages that we can leverage in our data analysis

## Importing Packages

Kind of unsure how this works on Windows, but try [one of these solutions](https://stackoverflow.com/questions/1449494/how-do-i-install-python-packages-on-windows)

If you have Mac or UNIX then this command should work `pip3 install pandas`

In [2]:
import pandas as pd

## Importing our Dataset

Go to [kaggle](kaggle.com) and download the dataset that you want to analyze

Move the dataset into the same folder as the folder where this jupyter notebook is located<br>
Ex.) this notebook is in my "Documents" folder, so I would move my dataset to the Documents folder as well

I'll be working with a 

In [3]:
## Read in the data 
wine_data = pd.read_csv("wine_data.csv")

In [None]:
type(wine_data)

Dataframes are a datastructure used in pandas to represent tabluar data

In [8]:
wine_data.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


In [15]:
len(wine_data)

129971

In [36]:
wine_data['title'].nunique()

118840

Every row is a unique bottle of wine + vintage, and the columns are different attributes of the wine (price, quality, region, etc)

### Cleaning

In [4]:
clean_wine_data = wine_data.dropna()

In [18]:
len(clean_wine_data)

22387

In [5]:
clean_wine_data = wine_data.dropna(subset=["price","points","variety","country","region_1"]).copy(deep=True)

In [6]:
len(clean_wine_data)

101400

### Example Questions

**What is the most popular variety of wine?**

In [38]:
clean_wine_data['variety'].value_counts()

Pinot Noir                  12787
Chardonnay                  11080
Cabernet Sauvignon           9386
Red Blend                    8476
Bordeaux-style Red Blend     5340
Riesling                     4972
Sauvignon Blanc              4783
Syrah                        4086
Rosé                         3262
Merlot                       3062
Zinfandel                    2708
Malbec                       2593
Sangiovese                   2377
Nebbiolo                     2331
Portuguese Red               2196
White Blend                  2172
Sparkling Blend              2027
Tempranillo                  1789
Rhône-style Red Blend        1405
Pinot Gris                   1391
Cabernet Franc               1305
Champagne Blend              1211
Grüner Veltliner             1145
Pinot Grigio                 1002
Portuguese White              986
Viognier                      985
Gewürztraminer                956
Gamay                         836
Shiraz                        822
Petite Sirah  

**Which varieties of wine are most expensive on average?**

In [42]:
clean_wine_data.groupby('variety')['price'] \
    .mean() \
    .sort_values(ascending = False) \
    .head()

variety
Ramisco             495.000000
Terrantez           236.000000
Francisa            160.000000
Rosenmuskateller    150.000000
Malbec-Cabernet     113.333333
Name: price, dtype: float64

**Which wines are the best value?**

Simple heuristic for value: quality / price 

In [45]:
clean_wine_data['value'] = clean_wine_data['points'] / clean_wine_data['price']

In [56]:
value_wine = clean_wine_data. \
    groupby('variety')['value','price'] \
    .mean() \
    .sort_values(by = 'value', ascending = False)

Do you guys see any problems here? Ideas on how to fix it?

# Creating a Dashboard

Plotly Dash is an open-source Python package that can be used to create interactive dashboards

Here's a good guide for [getting started](https://dash.plot.ly/getting-started)

**What are the best value wines?**

Lets create a dashboard to answer this question using more of the data. First we should **draw it out**.

# Additional Resources

[Hitchikers Guide to Python](https://docs.python-guide.org/intro/learning/)