# Dataframe Manipulation Warmup

In [1]:
import numpy as np
import pandas as pd

np.random.seed(406)

n = 5000
df = pd.DataFrame({
    'favorite_animal': np.random.choice(['cat', 'dog', 'frog', 'lemur', 'panda'], n),
    'favorite_vegetable': np.random.choice(['brussel sprouts', 'potato', 'squash'], n),
    'favorite_fruit': np.random.choice(['banana', 'apple', 'blueberries'], n),
    'wears_glasses': np.random.choice(['yes', 'no'], n),
    'netflix_consumption': np.random.normal(10, 2, n),
    'open_browser_tabs': np.random.randint(2, 90, n),
})

- What is the highest amount of netflix consumption? `17.535`
- How many people wear glasses? What percentage of people is this? `2555`, `.511`
- How many people's favorite animal is a dog? `1002`
- What is the most common favorite animal? `lemur`
- What is the average netflix consumption for people that prefer brussel
  sprouts? `10.008`
- What is the most common favorite fruit for people who wear glasses and have
  more than 40 open browser tabs? `blueberries`
- What percentage of people have a netflix consumption lower than 7? `.0716`
- What is the average netflix consumption for people with less than 30 open
  browser tabs? `9.91935`
- How many people *don't* wear glasses, have a favorite animal of a panda, have
  a favorite fruit of blueberries, and have more than 60 open browser tabs? What
  is the median netflix consumption for this group? What is the most common
  favorite vegetable for this group? `46`, `10.455`, `potato`
- What is the least popular combination of favorite fruit and vegetable? `apple` and `potato`
- Which combination of favorite animal and wearing glasses has the highest average
  netflix consumption? people that wear glasses and prefer pandas
- **Bonus**: for each of the above questions, what kind of visualization would
  be the most effective in conveying your answer?

In [7]:
df.netflix_consumption.max()

17.534818515438925

In [83]:
(df.wears_glasses == 'yes').mean()

0.511

In [15]:
(df.favorite_animal == 'dog').sum()

1002

In [30]:
df.favorite_animal.value_counts().index[0]

'lemur'

In [36]:
(df.netflix_consumption[df.favorite_vegetable == 
                            'brussel sprouts']).mean()

10.00847154798366

In [48]:
df.favorite_fruit[(df.wears_glasses == 'yes') & 
            (df.open_browser_tabs > 40)].value_counts().index[0]

'blueberries'

In [84]:
(df.netflix_consumption < 7).mean()

0.0716

In [50]:
(df.netflix_consumption[df.open_browser_tabs < 30]).mean()

9.91935736918227

In [85]:
((df.wears_glasses == 'no') & 
 (df.favorite_animal == 'panda') & 
 (df.favorite_fruit == 'blueberries') & 
 (df.open_browser_tabs > 60)).sum()

46

In [87]:
mask = ((df.wears_glasses == 'no') & 
 (df.favorite_animal == 'panda') & 
 (df.favorite_fruit == 'blueberries') & 
 (df.open_browser_tabs > 60))
mask.sum()

46

In [81]:
df.netflix_consumption[(df.wears_glasses == 'no') & 
    (df.favorite_animal == 'panda') & 
    (df.favorite_fruit == 'blueberries') & 
    (df.open_browser_tabs > 60)].median()

10.45479760071613

In [88]:
df[mask].netflix_consumption.median()

10.45479760071613

In [86]:
df.favorite_vegetable[(df.wears_glasses == 'no') & 
    (df.favorite_animal == 'panda') & 
    (df.favorite_fruit == 'blueberries') & 
    (df.open_browser_tabs > 60)].value_counts().index[0]

'potato'

In [89]:
df[mask].favorite_vegetable.value_counts().index[0]

'potato'

What is the least popular combination of favorite fruit and vegetable? apple and potato

In [56]:
df.head()

Unnamed: 0,favorite_animal,favorite_vegetable,favorite_fruit,wears_glasses,netflix_consumption,open_browser_tabs
0,lemur,potato,apple,yes,8.313351,44
1,panda,potato,apple,yes,11.801073,10
2,cat,squash,blueberries,yes,10.105141,35
3,lemur,squash,apple,no,11.024605,70
4,dog,brussel sprouts,apple,yes,6.732698,73


In [65]:
df['fruit_and_veggie'] = (df.favorite_fruit 
                    + ' and ' + df.favorite_vegetable)

In [68]:
df.fruit_and_veggie.value_counts().index[-1]

'apple and potato'

In [90]:
df.groupby(['favorite_fruit', 'favorite_vegetable']).size().sort_values()

favorite_fruit  favorite_vegetable
apple           potato                512
banana          squash                524
apple           squash                555
blueberries     brussel sprouts       555
                potato                560
apple           brussel sprouts       565
banana          potato                570
                brussel sprouts       576
blueberries     squash                583
dtype: int64

Which combination of favorite animal and wearing glasses has the highest average netflix consumption? people that wear glasses and prefer pandas

In [71]:
df['ave_cons'] = df.wears_glasses + ', ' +df.favorite_animal
df.head()

Unnamed: 0,favorite_animal,favorite_vegetable,favorite_fruit,wears_glasses,netflix_consumption,open_browser_tabs,fruit_and_veggie,ave_cons
0,lemur,potato,apple,yes,8.313351,44,apple and potato,"yes, lemur"
1,panda,potato,apple,yes,11.801073,10,apple and potato,"yes, panda"
2,cat,squash,blueberries,yes,10.105141,35,blueberries and squash,"yes, cat"
3,lemur,squash,apple,no,11.024605,70,apple and squash,"no, lemur"
4,dog,brussel sprouts,apple,yes,6.732698,73,apple and brussel sprouts,"yes, dog"


In [80]:
df.groupby(by='ave_cons').netflix_consumption.mean().sort_values().index[-1]


'yes, panda'

In [91]:
df.pivot_table('netflix_consumption', 'favorite_animal', 'wears_glasses')

wears_glasses,no,yes
favorite_animal,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,9.846183,9.884685
dog,9.933246,10.087352
frog,9.962311,9.83474
lemur,10.024557,10.010196
panda,9.946293,10.092273


What is the highest amount of netflix consumption? 17.535
- barchart

How many people wear glasses? What percentage of people is this? 2555, .511
- pie chart

How many people's favorite animal is a dog? 1002
- infographic

What is the most common favorite animal? lemur
- barchart

What is the average netflix consumption for people that prefer brussel sprouts? 10.008
- box and whisker plot

What is the most common favorite fruit for people who wear glasses and have more than 40 open browser tabs? blueberries
- barchart

What percentage of people have a netflix consumption lower than 7? .0716
- pie chart

What is the average netflix consumption for people with less than 30 open browser tabs? 9.91935
- box and whisker plot

How many people don't wear glasses, have a favorite animal of a panda, have a favorite fruit of blueberries, and have more than 60 open browser tabs? What is the median netflix consumption for this group? What is the most common favorite vegetable for this group? 46, 10.455, potato
- combination infographic

What is the least popular combination of favorite fruit and vegetable? apple and potato
- barchart

Which combination of favorite animal and wearing glasses has the highest average netflix consumption? people that wear glasses and prefer pandas
- barchart