# Exploratory Data Analysis with Pandas

   #### Resources: 
   
   ##### You Tube 
       - https://www.youtube.com/watch?v=vmEHCJofslg
       - https://www.youtube.com/playlist?list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS
   ##### Cheat sheets: 
       - https://www.enthought.com/wp-content/uploads/Enthought-Python-Pandas-Cheat-Sheets-1-8-v1.0.2.pdf
   ##### Towards Data Science
       - https://towardsdatascience.com/how-to-master-pandas-for-data-science-b8ab0a9b1042
       
   ##### The documentation: 
       - https://pandas.pydata.org/docs/index.html

__Series:__ A Pandas Series is a one-dimensional array that is very much similar to the NumPy array but with the ability to be labeled (i.e. the axis label or also called the index). A Series can hold an integer, float, string, python object, etc. At a high-level, a Series can be thought of as a column in Microsoft Excel.

__DataFrame:__ A Pandas DataFrame is a two-dimensional array. At a high-level, a DataFrame can be thought of as the spreadsheet in Microsoft Excel (i.e. a M Ã— N matrix where M denotes the rows and N the columns).
(from https://towardsdatascience.com/how-to-master-pandas-for-data-science-b8ab0a9b1042)

In [None]:
import pandas as pd

## Load and Inspect

In [None]:
df = pd.read_csv("starbucks_drinkMenu_expanded.csv")

### head(), tail(), sample()

In [None]:
df.head(n=2)

In [None]:
df.sample(2)

In [None]:
df.columns

In [None]:
df[" Sugars (g)"].values

In [None]:
df = df.rename(columns={" Sugars (g)": "sugars_g"})
df.columns

In [None]:
df.columns = df.columns.str.replace(" ", "")
df.columns = df.columns.str.replace("(", "_")
df.columns = df.columns.str.replace(")", "")
df.columns = df.columns.str.replace("%", "perc_")
df.columns

### summaries: 

In [None]:
df.shape

In [None]:
df.dtypes

In [None]:
df.describe(include="all")

#### Identify missing values

In [None]:
df.isna()

In [None]:
df2 = df.dropna()

In [None]:
df2.shape

## Select and Filter

In [None]:
df.Beverage_category.unique()

In [None]:
df_coffee = df.loc[(df.Beverage_category=="Coffee")].copy() # copy is necessary to avoid settingwidthcopy error
df_coffee.head()

### Let's convert g to mg

In [None]:
df_coffee["Protein_g"] = df_coffee.Protein_g*1000

### do the same for TotalFat_g, Sugars_g

#### does it work? why not?

In [None]:
df_coffee["TotalFat_gg"] = df_coffee.TotalFat_g*10

### Sort from Tallest to Smallest

In [None]:
df_coffee.sort_values("Beverage_prep", ascending=False)

### recode the size to numeric and sort by size 

In [None]:
df_coffee["drink_size"]= df_coffee["Beverage_prep"].map({"Short": 1, "Tall": 2, "Grande": 3, "Venti": 4})

In [None]:
df_coffee.sort_values("drink_size", ascending=False)

### Sort by multiple indices 

In [None]:
df.sort_values(["Beverage_prep", "Caffeine_mg"], ascending=False).head()

In [None]:
### Index and Multiindex

## Grouping

In [None]:
df.groupby("Beverage_category")["Beverage"].count()

In [None]:
df3 = df.groupby("Beverage_category")
df3.groups

### Pivot and melt

In [None]:
df_smoothies = df[df.Beverage_category=="Smoothies"]

In [None]:
df_smoothies_mini = df_smoothies[["Beverage", "Beverage_prep", "TotalFat_g"]].copy()

In [None]:
df_smoothies_mini.shape

In [None]:
df_smoothies_mini=df_smoothies_mini.pivot(index = "Beverage",columns='Beverage_prep', values=["TotalFat_g"])
df_smoothies_mini.head()

## Plotting with pandas

In [None]:
df.Caffeine_mg.unique()

In [None]:
df1 = df[~df.Caffeine_mg.isin(["0", "varies", "Varies"])].copy()
df1 = df1[~df.Caffeine_mg.isna()].copy()
df1.Caffeine_mg = df1.Caffeine_mg.astype(float)

In [None]:
df1.Caffeine_mg.plot(kind="hist", bins=20)

In [None]:
df1.dtypes

In [None]:
df2 = df1.groupby("Beverage_prep")
df2.plot.scatter(x="Caffeine_mg", y="sugars_g")

In [None]:
df_coffee[["Beverage_prep", "Protein_g"]].plot(kind="bar")

## Multiindex

In [None]:
df = pd.read_csv("starbucks_drinkMenu_expanded.csv")
df.index.names
df = df.set_index(["Beverage_category", "Beverage_prep"])

print(df.index.names)
print(df.index.values)

### Issures: 
-- Setting with copy : https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

-- Index/multiindex stack unstack

####
Excecises average caffeine content 
average caffeine comntent by drink type 
get the drink with the most caffeine 