# More pandas practice!

<img src="Panda-Rolling-3-Alt.gif"> 

[gif source](https://disneyrewards.com/app/uploads/sites/9/Panda-Rolling-3-Alt.gif)

This week we will take a look at some more pandas functions and use practice using documentation to help us understand using them. 
We'll look at...
- `.apply()`, `Series.map()`, and `DataFrame.map()`
- `.groupby()`, `.agg()` and `.pivottable()`
- `.join()`, `.merge()`, and `.concat()`

And as a (non-pandas) bonus... `lambda` functions!


**Remember - practice is the best way to get more comfortable coding! You'll likely never know \*all\* the functions in pandas... but the ones you use frequently will stick and you should get comfortable using documentation to use the others!**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ali-rivera/Python-Support-Hours/blob/main/7_Pandas3/Pandas_3_Blank.ipynb)

### Load in Data
This data comes from [Kaggle](https://www.kaggle.com/datasets/giodev11/usstates-dataset)

In [1]:
import pandas as pd
import numpy as np

In [2]:
# which version of pandas are you using - useful for understanding version update issues
print(pd.__version__)

2.0.3


In [3]:
# load in data files

abbrev = pd.read_csv("state-abbrevs.csv")
areas = pd.read_csv("state-areas.csv")
pops = pd.read_csv("state-population.csv")

In [4]:
# show first 5 lines of each df
display(abbrev.head(), areas.head(), pops.head())

Unnamed: 0,state,abbreviation
0,Alabama,AL
1,Alaska,AK
2,Arizona,AZ
3,Arkansas,AR
4,California,CA


Unnamed: 0,state,area (sq. mi)
0,Alabama,52423
1,Alaska,656425
2,Arizona,114006
3,Arkansas,53182
4,California,163707


Unnamed: 0,state/region,ages,year,population
0,AL,under18,2012,1117489.0
1,AL,total,2012,4817528.0
2,AL,under18,2010,1130966.0
3,AL,total,2010,4785570.0
4,AL,under18,2011,1125763.0


## Labmda functions

[Lambda functions](https://www.w3schools.com/python/python_lambda.asp) are small functions that you can quickly build and call in python. They are useful if you have one or more inputs to a function, and only need to perform one expression/action.

In [5]:
# a function tha takes a value (x) and multiplies it by 2

multiply2 = 
multiply2(5)

10

## .join(), .merge(), and .concat()

[.join()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html) is a function that joins columns of one DataFrame with another using the index to join - meaning the DataFrames you are joining must have the same index!

In [20]:
# Use .join() to combine abbrev and areas


Unnamed: 0_level_0,abbreviation,area (sq. mi)
state,Unnamed: 1_level_1,Unnamed: 2_level_1
Alabama,AL,52423
Alaska,AK,656425
Arizona,AZ,114006
Arkansas,AR,53182
California,CA,163707


[.merge()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) is another function that merges DataFrames together, either on column names or indexes. The function looks for columns with the same name, but you can also specify what column you want to merge on.

In [None]:
# use .merge() to combine abbrev and areas
merge_area = 

In [8]:
# create version of pops (where year=2013 and age=total) to merge with
merge_pops = 

In [10]:
# look at pops and area to merge
display(merge_pops.head(), merge_area.head())

Unnamed: 0,state/region,population
9,AL,4833722.0
87,AK,735132.0
103,AZ,6626624.0
185,AR,2959373.0
199,CA,38332521.0


Unnamed: 0,state,abbreviation,area (sq. mi)
0,Alabama,AL,52423
1,Alaska,AK,656425
2,Arizona,AZ,114006
3,Arkansas,AR,53182
4,California,CA,163707


In [11]:
# use merge to combine merge_pops (where year=2013 and age=total) and merge_area
# note that these have different column names for the state abbreviation - we can specify that with the function parameters!
merged_df = 

[.concat()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) is a function that concatenates pandas objects either by row (`axis=0`) or by column (`axis=1`).

In [19]:
# Add a new state to abbrev using .concat()
new_state = pd.DataFrame({'state':"New State", 'abbreviation': "NS"}, index=[0])


Unnamed: 0,state,abbreviation
47,Washington,WA
48,West Virginia,WV
49,Wisconsin,WI
50,Wyoming,WY
51,New State,NS


## .apply(), Series.map(), and DataFrame.map()

[.apply()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) is a function that applies a function (that you pass in, as an argument) to a `DataFrame` either over a column of row.

In [13]:
# Use .apply() on merged_df to divide population and area by 1000
#hint: .apply() works on all the columns, but not the index. You can set state/region as the index



Unnamed: 0_level_0,population,area (sq. mi)
state/region,Unnamed: 1_level_1,Unnamed: 2_level_1
AL,4833.722,52.423
AK,735.132,656.425
AZ,6626.624,114.006
AR,2959.373,53.182
CA,38332.521,163.707


[Series.map()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html) is a function that can be used to apply a function to a `Series` or map values to another defined value.

In [14]:
# use .map() on the series state from abbrev to make the state name lowercase


0       alabama
1        alaska
2       arizona
3      arkansas
4    california
Name: state, dtype: object

[DataFrame.map()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.map.html) was previously called `.applymap()` and applies a function element-by-element to an entire DataFrame.

In [15]:
# use .map (or .applymap(), if you haven't updated your pandas) to make the entire abbrev df lowercase


Unnamed: 0,state,abbreviation
0,alabama,al
1,alaska,ak
2,arizona,az
3,arkansas,ar
4,california,ca


## .groupby(), .agg() and .pivot_table()

[.group_by()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) is a function that can be used to create a summary table from some subset of a `DataFrame`. In order to use `.groupby()`, you'll need to indicate some way to combine the data into the summary you want. You can use functions like .sum() or .mean().

In [16]:
# for the year 1996, .groupby() population by ages and state/region and get the sum 


Unnamed: 0_level_0,Unnamed: 1_level_0,year,population
ages,state/region,Unnamed: 2_level_1,Unnamed: 3_level_1
total,AK,1996,608569.0
total,AL,1996,4331103.0
total,AR,1996,2572109.0
total,AZ,1996,4586940.0
total,CA,1996,32018834.0
...,...,...,...
under18,VT,1996,151490.0
under18,WA,1996,1449613.0
under18,WI,1996,1352877.0
under18,WV,1996,422831.0


[.agg()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html) is a function that aggregates information over an axis (either by rows `axis=0` or by columns `axis=1`). You'll need to specify how things should be aggregated by passing a function(s) like 'sum', 'min', 'max'... etc.

In [17]:
# use .groupby(['ages', 'state/region']) and then .agg() to get the mean - then drop the year column (bc a mean year is not entriely useful here)



Unnamed: 0_level_0,Unnamed: 1_level_0,population
ages,state/region,Unnamed: 2_level_1
total,AK,6.462048e+05
total,AL,4.484528e+06
total,AR,2.693178e+06
total,AZ,5.294600e+06
total,CA,3.433414e+07
...,...,...
under18,VT,1.409114e+05
under18,WA,1.492880e+06
under18,WI,1.344394e+06
under18,WV,4.049168e+05


[.pivot_table()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html) is a function that allows you to create a pivot table like in Microsoft Excel.

In [18]:
# use .pivot_table() to create the same outpu as above - the mean population over all years indexed by age and state/region


Unnamed: 0_level_0,Unnamed: 1_level_0,population
ages,state/region,Unnamed: 2_level_1
total,AK,6.462048e+05
total,AL,4.484528e+06
total,AR,2.693178e+06
total,AZ,5.294600e+06
total,CA,3.433414e+07
...,...,...
under18,VT,1.409114e+05
under18,WA,1.492880e+06
under18,WI,1.344394e+06
under18,WV,4.049168e+05


## Next Steps

- Open up a new .ipynb (or .py, whatever you prefer) file.

- Try some practice problems from this site with [101 Pandas Practice problems](https://www.machinelearningplus.com/python/101-pandas-exercises-python/). If there are specific skills or functions you've been struggling with, `ctrl+F` key terms associated with that thing to find problems like it.

**Notes:**<br />
&emsp; - Each problem has a 'difficulty level' this is subjective, take it with a grain of salt. <br />
&emsp; - Some of the problems give you code you'll need to get started with along with the prompt. <br />
&emsp; - All problems have a `show solution` button. Try to struggle through problems before you show the solution - that's where the learning happens!<br />
&emsp; - There are usually multiple ways to do things. If you achieve the same result with a different method than the solution, give yourself a high five!