# Working With Pandas Data Frames
---

* Import the pandas library as pd and read in the csv that we just wrote as a data frame using the `read_csv`.

In [1]:
import pandas as pd
df = pd.read_csv('./gapminder_final.csv')

* return the data frame from jupyter to view it (pretty)

In [2]:
df

Unnamed: 0.1,Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
0,0,Afghanistan,1997,22227415.0,Asia,41.763,635.341351
1,1,Afghanistan,2002,25268405.0,Asia,42.129,726.734055
2,2,Afghanistan,2007,31889923.0,Asia,43.828,974.580338
3,3,Afghanistan,1952,8425333.0,Asia,28.801,779.445314
4,4,Afghanistan,1957,9240934.0,Asia,30.332,820.853030
5,5,Afghanistan,1962,10267083.0,Asia,31.997,853.100710
6,6,Afghanistan,1967,11537966.0,Asia,34.020,836.197138
7,7,Afghanistan,1972,13079460.0,Asia,36.088,739.981106
8,8,Afghanistan,1977,14880372.0,Asia,38.438,786.113360
9,9,Afghanistan,1982,12881816.0,Asia,39.854,978.011439


* What if we wanted to break up all the data into files for each continent?
* What if we wanted to have only one row for each country?
  * We would have to have columns for each year
  
** Don't worry about all of the code in the block below. By the end of this section it will make sense. **

In [4]:
allcountries_df = pd.read_csv("./gapminder_final.csv")
df_t = allcountries_df

years = sorted(list(df_t['year'].unique()))
# Convert years to strings

for year in years:
    col_name = "gdpPercap_" + str(year)
    df_t[col_name] = 0

for y in years:
    df_t.loc[df_t.year == y, "gdpPercap_" + str(y)] = df_t["gdpPercap"]

df_t = df_t.groupby(df_t["country"]).max()
df_t["pop2007"] = df_t["pop"]
df_t["lifeExp2007"] = df_t["lifeExp"]

df_t.drop("country", axis=1, inplace=True)
df_t.drop("gdpPercap", axis=1, inplace=True)
df_t.drop("year", axis=1, inplace=True)
df_t.drop("lifeExp", axis=1, inplace=True)

continents = df_t["continent"].unique()

df_t.to_csv("./gapminder_All.csv")

for continent in continents:
    new_df = df_t.loc[df_t["continent"] == continent]
    newfilename = "gapminder_" + continent + ".csv"
    #new_df.to_csv("./python-lessons" + newfilename)
    new_df.to_csv(newfilename)
    df.to_csv("./python-lessons")

* Check out our data folder after we ran this script. We now have six additional csv files.
    ~~~
    * gapminder_All.csv
    * gapminder_Oceana.csv
    * gapminder_Europe.csv
    * gapminder_Asia.csv
    * gapminder_Americas.csv
    * gapminder_Africa.csv
    ~~~

Lets read in the Africa csv to see how the structure has changed.

In [11]:
# Read in gapminder_Africa.csv as a data frame
df_africa = pd.read_csv('./gapminder_Africa.csv')
df_africa

Unnamed: 0.1,country,Unnamed: 0,pop,continent,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007,pop2007,lifeExp2007
0,Algeria,35,33333216.0,Africa,2449.008185,3013.976023,2550.81688,3246.991771,4182.663766,4910.416756,5745.160213,5681.358539,5023.216647,4797.295051,5288.040382,6223.367465,33333216.0,72.301
1,Angola,47,12420476.0,Africa,3520.610273,3827.940465,4269.276742,5522.776375,5473.288005,3008.647355,2756.953672,2430.208311,2627.845685,2277.140884,2773.287312,4797.231267,12420476.0,42.731
2,Benin,131,8078314.0,Africa,1062.7522,959.60108,949.499064,1035.831411,1085.796879,1029.161251,1277.897616,1225.85601,1191.207681,1232.975292,1372.877931,1441.284873,8078314.0,56.728
3,Botswana,167,1639131.0,Africa,851.241141,918.232535,983.653976,1214.709294,2263.611114,3214.857818,4551.14215,6205.88385,7954.111645,8647.142313,11003.60508,12569.85177,1639131.0,63.622
4,Burkina Faso,203,14326203.0,Africa,543.255241,617.183465,722.512021,794.82656,854.735976,743.387037,807.198586,912.063142,931.752773,946.294962,1037.645221,1217.032994,14326203.0,52.295
5,Burundi,215,8390505.0,Africa,339.296459,379.564628,355.203227,412.977514,464.099504,556.103265,559.603231,621.818819,631.699878,463.115148,446.403513,430.070692,8390505.0,49.58
6,Cameroon,239,17696293.0,Africa,1172.667655,1313.048099,1399.607441,1508.453148,1684.146528,1783.432873,2367.983282,2602.664206,1793.163278,1694.337469,1934.011449,2042.09524,17696293.0,54.985
7,Central African Republic,263,4369038.0,Africa,1071.310713,1190.844328,1193.068753,1136.056615,1070.013275,1109.374338,956.752991,844.87635,747.905525,740.506332,738.690607,706.016537,4369038.0,50.485
8,Chad,275,10238807.0,Africa,1178.665927,1308.495577,1389.817618,1196.810565,1104.103987,1133.98495,797.908101,952.386129,1058.0643,1004.961353,1156.18186,1704.063724,10238807.0,51.724
9,Comoros,323,710960.0,Africa,1102.990936,1211.148548,1406.648278,1876.029643,1937.577675,1172.603047,1267.100083,1315.980812,1246.90737,1173.618235,1075.811558,986.147879,710960.0,65.152


We can see that now there is only one row for each country. Lets re-read the file using the index_col parameter

In [12]:
# Re-read the file using index_cols
#df_africa = pd.read_csv('gapminder_Africa.csv', index_col='country')
#df_africa

df_all = pd.read_csv('gapminder_All.csv')#, index_col='country')
df_all


Unnamed: 0.1,country,Unnamed: 0,pop,continent,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007,pop2007,lifeExp2007
0,Afghanistan,11,3.188992e+07,Asia,779.445314,820.853030,853.100710,836.197138,739.981106,786.113360,978.011439,852.395945,649.341395,635.341351,726.734055,974.580338,3.188992e+07,43.828
1,Albania,23,3.600523e+06,Europe,1601.056136,1942.284244,2312.888958,2760.196931,3313.422188,3533.003910,3630.880722,3738.932735,2497.437901,3193.054604,4604.211737,5937.029526,3.600523e+06,76.423
2,Algeria,35,3.333322e+07,Africa,2449.008185,3013.976023,2550.816880,3246.991771,4182.663766,4910.416756,5745.160213,5681.358539,5023.216647,4797.295051,5288.040382,6223.367465,3.333322e+07,72.301
3,Angola,47,1.242048e+07,Africa,3520.610273,3827.940465,4269.276742,5522.776375,5473.288005,3008.647355,2756.953672,2430.208311,2627.845685,2277.140884,2773.287312,4797.231267,1.242048e+07,42.731
4,Argentina,59,4.030193e+07,Americas,5911.315053,6856.856212,7133.166023,8052.953021,9443.038526,10079.026740,8997.897412,9139.671389,9308.418710,10967.281950,8797.640716,12779.379640,4.030193e+07,75.320
5,Australia,71,2.043418e+07,Oceania,10039.595640,10949.649590,12217.226860,14526.124650,16788.629480,18334.197510,19477.009280,21888.889030,23424.766830,26997.936570,30687.754730,34435.367440,2.043418e+07,81.235
6,Austria,83,8.199783e+06,Europe,6137.076492,8842.598030,10750.721110,12834.602400,16661.625600,19749.422300,21597.083620,23687.826070,27042.018680,29095.920660,32417.607690,36126.492700,8.199783e+06,79.829
7,Bahrain,95,7.085730e+05,Asia,9867.084765,11635.799450,12753.275140,14804.672700,18268.658390,19340.101960,19211.147310,18524.024060,19035.579170,20292.016790,23403.559270,29796.048340,7.085730e+05,75.635
8,Bangladesh,107,1.504483e+08,Asia,684.244172,661.637458,686.341554,721.186086,630.233627,659.877232,676.981866,751.979403,837.810164,972.770035,1136.390430,1391.253792,1.504483e+08,64.062
9,Belgium,119,1.039223e+07,Europe,8343.105127,9714.960623,10991.206760,13149.041190,16672.143560,19117.974480,20979.845890,22525.563080,25575.570690,27561.196630,30485.883750,33692.605080,1.039223e+07,79.441


## Inspecting Data
We've already see how we can get information about a dataframe using the `.info()` and `.describe()` functions, but there are many ways to get information and view a data frame

* Can also use describe() on data frame selections (like a single column)

In [13]:
# Use the .describe() function on the "gdpPercap" column
#df_africa.info()
df_all.describe()

Unnamed: 0.1,Unnamed: 0,pop,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007,pop2007,lifeExp2007
count,142.0,142.0,142.0,142.0,142.0,142.0,142.0,142.0,142.0,142.0,142.0,142.0,142.0,142.0,142.0,142.0
mean,857.0,44049200.0,3725.276046,4299.408345,4725.812342,5483.653047,6770.082815,7313.166421,7518.901673,7900.920218,8158.608521,9090.175363,9917.848365,11680.07182,44049200.0,68.035415
std,493.631441,147615900.0,9321.064786,9869.662202,8667.362525,8095.315431,10614.383403,8362.48915,7733.845006,8288.281304,9031.84608,10171.493263,11154.114865,12859.937337,147615900.0,10.784702
min,11.0,199579.0,298.846212,335.997115,355.203227,349.0,357.0,371.0,424.0,385.0,347.0,312.188423,241.165877,277.551859,199579.0,42.568
25%,434.0,4508559.0,864.752389,930.540819,1059.149171,1151.245103,1257.193853,1357.257252,1363.338985,1327.469823,1270.660958,1366.837958,1409.567264,1624.842248,4508559.0,59.6945
50%,857.0,10674190.0,1968.528344,2173.220291,2335.439533,2678.334741,3339.129407,3798.609244,4216.228428,4280.300366,4386.085502,4781.825478,5319.804524,6124.371109,10674190.0,71.9355
75%,1280.0,31210040.0,3913.492777,4876.356362,5709.381428,7075.932943,9508.839303,11204.102423,12347.953723,11994.052795,10684.35187,12022.867188,13359.512257,18008.83564,31210040.0,76.41325
max,1703.0,1318683000.0,108382.3529,113523.1329,95458.11176,80894.88326,109347.867,59265.47714,33693.17525,31540.9748,34932.91959,41283.16433,44683.97525,49357.19017,1318683000.0,82.603


* We can print the first or last x number of rows of our data frame using the head() and tails() functions.

In [14]:
# heads() function
#df_africa.head()
df_all.head(10)

Unnamed: 0.1,country,Unnamed: 0,pop,continent,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007,pop2007,lifeExp2007
0,Afghanistan,11,31889923.0,Asia,779.445314,820.85303,853.10071,836.197138,739.981106,786.11336,978.011439,852.395945,649.341395,635.341351,726.734055,974.580338,31889923.0,43.828
1,Albania,23,3600523.0,Europe,1601.056136,1942.284244,2312.888958,2760.196931,3313.422188,3533.00391,3630.880722,3738.932735,2497.437901,3193.054604,4604.211737,5937.029526,3600523.0,76.423
2,Algeria,35,33333216.0,Africa,2449.008185,3013.976023,2550.81688,3246.991771,4182.663766,4910.416756,5745.160213,5681.358539,5023.216647,4797.295051,5288.040382,6223.367465,33333216.0,72.301
3,Angola,47,12420476.0,Africa,3520.610273,3827.940465,4269.276742,5522.776375,5473.288005,3008.647355,2756.953672,2430.208311,2627.845685,2277.140884,2773.287312,4797.231267,12420476.0,42.731
4,Argentina,59,40301927.0,Americas,5911.315053,6856.856212,7133.166023,8052.953021,9443.038526,10079.02674,8997.897412,9139.671389,9308.41871,10967.28195,8797.640716,12779.37964,40301927.0,75.32
5,Australia,71,20434176.0,Oceania,10039.59564,10949.64959,12217.22686,14526.12465,16788.62948,18334.19751,19477.00928,21888.88903,23424.76683,26997.93657,30687.75473,34435.36744,20434176.0,81.235
6,Austria,83,8199783.0,Europe,6137.076492,8842.59803,10750.72111,12834.6024,16661.6256,19749.4223,21597.08362,23687.82607,27042.01868,29095.92066,32417.60769,36126.4927,8199783.0,79.829
7,Bahrain,95,708573.0,Asia,9867.084765,11635.79945,12753.27514,14804.6727,18268.65839,19340.10196,19211.14731,18524.02406,19035.57917,20292.01679,23403.55927,29796.04834,708573.0,75.635
8,Bangladesh,107,150448339.0,Asia,684.244172,661.637458,686.341554,721.186086,630.233627,659.877232,676.981866,751.979403,837.810164,972.770035,1136.39043,1391.253792,150448339.0,64.062
9,Belgium,119,10392226.0,Europe,8343.105127,9714.960623,10991.20676,13149.04119,16672.14356,19117.97448,20979.84589,22525.56308,25575.57069,27561.19663,30485.88375,33692.60508,10392226.0,79.441


In [32]:
# tails() function
#df_africa.tail()
df_all.tail(3)

Unnamed: 0_level_0,Unnamed: 0,pop,continent,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007,pop2007,lifeExp2007
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Yemen Rep.,1679,22211743.0,Asia,781.717576,804.830455,825.623201,862.442146,1265.047031,1829.765177,1977.55701,1971.741538,1879.496673,2117.484526,2234.820827,2280.769906,22211743.0,62.698
Zambia,1691,11746035.0,Africa,1147.388831,1311.956766,1452.725766,1777.077318,1773.498265,1588.688299,1408.678565,1213.315116,1210.884633,1071.353818,1071.613938,1271.211593,11746035.0,51.821
Zimbabwe,1703,12311143.0,Africa,406.884115,518.764268,527.272182,569.795071,799.362176,685.587682,788.855041,706.157306,693.420786,792.44996,672.038623,469.709298,12311143.0,62.351


* **Note:** head() and tail() default to 5 rows worth of data. We can also pass a value to either to get more or less rows.
  * E.g.  
  head(10)
  
 ---
  
* Use `.dtypes` method to get the data type for each column

In [15]:
# Write your code here
#df_africa.dtypes
df_all.dtypes

country            object
Unnamed: 0          int64
pop               float64
continent          object
gdpPercap_1952    float64
gdpPercap_1957    float64
gdpPercap_1962    float64
gdpPercap_1967    float64
gdpPercap_1972    float64
gdpPercap_1977    float64
gdpPercap_1982    float64
gdpPercap_1987    float64
gdpPercap_1992    float64
gdpPercap_1997    float64
gdpPercap_2002    float64
gdpPercap_2007    float64
pop2007           float64
lifeExp2007       float64
dtype: object

* Use `shape` method to get the row and column numbers

In [16]:
# Write your code here
#df_africa.shape
df_all.shape


(142, 18)

Here we can see the that the data have 52 rows of data and 18 attributes worth of information.

* Use the `len()` function to get numbers of each individually

In [18]:
# print number of rows of data
len(df_all.index)

142

In [19]:
# print number of columns of data
len(df_all.columns)

18

* As seen before we can pass the data frame a column name to get all values for that column


* Print out only the lifeExp2007 column from the data frame

In [20]:
# Write your code here
df_all['lifeExp2007']

0      43.828
1      76.423
2      72.301
3      42.731
4      75.320
5      81.235
6      79.829
7      75.635
8      64.062
9      79.441
10     56.728
11     65.554
12     74.852
13     63.622
14     72.390
15     73.005
16     52.295
17     49.580
18     59.723
19     54.985
20     80.653
21     50.485
22     51.724
23     78.553
24     72.961
25     72.889
26     65.152
27     47.804
28     57.470
29     78.782
        ...  
112    42.568
113    79.972
114    74.663
115    77.926
116    48.159
117    61.888
118    80.941
119    72.396
120    58.556
121    58.474
122    80.884
123    81.701
124    74.143
125    78.400
126    52.517
127    70.616
128    58.420
129    69.862
130    73.923
131    71.777
132    51.542
133    79.425
134    78.242
135    76.384
136    73.747
137    74.249
138    73.422
139    62.698
140    51.821
141    62.351
Name: lifeExp2007, Length: 142, dtype: float64

---
## EXERCISE:
1. How many different country names are there in the gapminder_all.csv file?

---

In [24]:
# Write your code here
countries = sorted(list(df_all['country'].unique()))
#years = sorted(list(df_t['year'].unique()))
#df_all
len(countries)

142

---
## EXERCISE:
1. What is the last country listed in the Africa data frame?

---

In [29]:
# Write your code here
df_africa = pd.read_csv('gapminder_Africa.csv')
df_africa
africanCountries = sorted(list(df_africa['country'].unique()))
df_africa.tail(1)

Unnamed: 0.1,country,Unnamed: 0,pop,continent,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007,pop2007,lifeExp2007
51,Zimbabwe,1703,12311143.0,Africa,406.884115,518.764268,527.272182,569.795071,799.362176,685.587682,788.855041,706.157306,693.420786,792.44996,672.038623,469.709298,12311143.0,62.351


---
## Get information about a particular column

* Operations like mean, max, min, can be used on individual columns
  * E.g. `df['year'].min()`

In [31]:
# What is the minimum value in the year column?
df_all2 = pd.read_csv('gapminder_final.csv')
df_all2['year'].min()

1952

---
## EXERCISE:
1. What is the mean life expectancy in 2007 for all countries in all years

1. What is the max population in 2007 for all countries in all years

1. What is the standard deviation for GDP per capita for all countries in all years


In [39]:
df3 = pd.read_csv("gapminder_all.csv")
# Mean Life Expectancy in 2007
avlifeexp = df3['lifeExp2007'].mean()
print(avlifeexp)
# Max Population in 2007
maxpop=df3['pop2007'].max()
print(maxpop)
# Standard Deviation of GDP per capita in 1952
df3.columns
df3['gdpPercap_1952'].std()

68.03541549295775
1318683096.0


9321.064786416604

## Rearange Columns

* This is difficult to do using a csv library or by hand
* The reverse() function will reverse the ordering of a list
     * E.g.   `['a', 'b', 'c']` to `['c', 'b', 'a']`

In [41]:
# Use the python list() function to get the data frame columns a s list
cols = list(df3.columns)
print( cols )

# use the .reverse() function to reverse the ordering of the columns
cols.reverse()
print ( cols )

['country', 'Unnamed: 0', 'pop', 'continent', 'gdpPercap_1952', 'gdpPercap_1957', 'gdpPercap_1962', 'gdpPercap_1967', 'gdpPercap_1972', 'gdpPercap_1977', 'gdpPercap_1982', 'gdpPercap_1987', 'gdpPercap_1992', 'gdpPercap_1997', 'gdpPercap_2002', 'gdpPercap_2007', 'pop2007', 'lifeExp2007']
['lifeExp2007', 'pop2007', 'gdpPercap_2007', 'gdpPercap_2002', 'gdpPercap_1997', 'gdpPercap_1992', 'gdpPercap_1987', 'gdpPercap_1982', 'gdpPercap_1977', 'gdpPercap_1972', 'gdpPercap_1967', 'gdpPercap_1962', 'gdpPercap_1957', 'gdpPercap_1952', 'continent', 'pop', 'Unnamed: 0', 'country']


* Using that now reversed list above, we can pass the list to the new data frame to reorder the columns

In [46]:
# Pass the cols variable to the data frame to re-arrange the columns
import pandas as pd
newDF = df3[cols]
newDF

Unnamed: 0.1,lifeExp2007,pop2007,gdpPercap_2007,gdpPercap_2002,gdpPercap_1997,gdpPercap_1992,gdpPercap_1987,gdpPercap_1982,gdpPercap_1977,gdpPercap_1972,gdpPercap_1967,gdpPercap_1962,gdpPercap_1957,gdpPercap_1952,continent,pop,Unnamed: 0,country
0,43.828,3.188992e+07,974.580338,726.734055,635.341351,649.341395,852.395945,978.011439,786.113360,739.981106,836.197138,853.100710,820.853030,779.445314,Asia,3.188992e+07,11,Afghanistan
1,76.423,3.600523e+06,5937.029526,4604.211737,3193.054604,2497.437901,3738.932735,3630.880722,3533.003910,3313.422188,2760.196931,2312.888958,1942.284244,1601.056136,Europe,3.600523e+06,23,Albania
2,72.301,3.333322e+07,6223.367465,5288.040382,4797.295051,5023.216647,5681.358539,5745.160213,4910.416756,4182.663766,3246.991771,2550.816880,3013.976023,2449.008185,Africa,3.333322e+07,35,Algeria
3,42.731,1.242048e+07,4797.231267,2773.287312,2277.140884,2627.845685,2430.208311,2756.953672,3008.647355,5473.288005,5522.776375,4269.276742,3827.940465,3520.610273,Africa,1.242048e+07,47,Angola
4,75.320,4.030193e+07,12779.379640,8797.640716,10967.281950,9308.418710,9139.671389,8997.897412,10079.026740,9443.038526,8052.953021,7133.166023,6856.856212,5911.315053,Americas,4.030193e+07,59,Argentina
5,81.235,2.043418e+07,34435.367440,30687.754730,26997.936570,23424.766830,21888.889030,19477.009280,18334.197510,16788.629480,14526.124650,12217.226860,10949.649590,10039.595640,Oceania,2.043418e+07,71,Australia
6,79.829,8.199783e+06,36126.492700,32417.607690,29095.920660,27042.018680,23687.826070,21597.083620,19749.422300,16661.625600,12834.602400,10750.721110,8842.598030,6137.076492,Europe,8.199783e+06,83,Austria
7,75.635,7.085730e+05,29796.048340,23403.559270,20292.016790,19035.579170,18524.024060,19211.147310,19340.101960,18268.658390,14804.672700,12753.275140,11635.799450,9867.084765,Asia,7.085730e+05,95,Bahrain
8,64.062,1.504483e+08,1391.253792,1136.390430,972.770035,837.810164,751.979403,676.981866,659.877232,630.233627,721.186086,686.341554,661.637458,684.244172,Asia,1.504483e+08,107,Bangladesh
9,79.441,1.039223e+07,33692.605080,30485.883750,27561.196630,25575.570690,22525.563080,20979.845890,19117.974480,16672.143560,13149.041190,10991.206760,9714.960623,8343.105127,Europe,1.039223e+07,119,Belgium


## Transposing tables

In many cases we may need to transpose the column and rows in a table.  Pandas allows us to to this easily with the `T` command.


In [48]:
# Print first three rows of the data frame
newDF.head(3)
# Transpose the dataframe and print the first three rows
newDF_trans = newDF.T
newDF_trans.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,132,133,134,135,136,137,138,139,140,141
lifeExp2007,43.828,76.423,72.301,42.731,75.32,81.235,79.829,75.635,64.062,79.441,...,51.542,79.425,78.242,76.384,73.747,74.249,73.422,62.698,51.821,62.351
pop2007,31889900.0,3600520.0,33333200.0,12420500.0,40301900.0,20434200.0,8199780.0,708573.0,150448000.0,10392200.0,...,29170400.0,60776200.0,301140000.0,3447500.0,26084700.0,85262400.0,4018330.0,22211700.0,11746000.0,12311100.0
gdpPercap_2007,974.58,5937.03,6223.37,4797.23,12779.4,34435.4,36126.5,29796.0,1391.25,33692.6,...,1056.38,33203.3,42951.7,10611.5,11415.8,2441.58,3025.35,2280.77,1271.21,469.709


---
## EXERCISE:
1. Read in a new data frame for the gapminder_Americas.csv file
2. Print the last last three columns of the data frame

In [73]:
df_amer = pd.read_csv('gapminder_Americas.csv', index_col = 'country')
#df_amer
len(df_amer.columns)
df_amer.columns

df_amer.loc[:, 'gdpPercap_2007':'lifeExp2007']


Unnamed: 0_level_0,gdpPercap_2007,pop2007,lifeExp2007
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Argentina,12779.37964,40301927.0,75.32
Bolivia,3822.137084,9119152.0,65.554
Brazil,9065.800825,190010647.0,72.39
Canada,36319.23501,33390141.0,80.653
Chile,13171.63885,16284741.0,78.553
Colombia,7006.580419,44227550.0,72.889
Costa Rica,9645.06142,4133884.0,78.782
Cuba,8948.102923,11416987.0,78.273
Dominican Republic,6025.374752,9319622.0,72.235
Ecuador,6873.262326,13755680.0,74.994


---
## Selecting values

Data Frames provides a index as a way to identify the rows of the table. A row also has a position inside the table as well as a label, which uniquely identifies its entry in the DataFrame.

To access a value at the position [ i , j ] (row, column) of a Data Frame, we have two options, depending on what is the meaning of i in use.

### Use DataFrame.iloc[..., ...] to select values by their position
* Allows you to specify location by numerical index similar to 2D version of character selection in strings.


In [61]:
print("\nData value in first row at first column: ", df.iloc[0, 0])


Data value in first row at first column:  0


In [62]:
print("\nData value in fifth row at third column: ", df.iloc[4, 2])


Data value in fifth row at third column:  1957


### Use `DataFrame.loc[..., ...]` to select values by their (index) label.

*   Can specify location by name or by numerical index.

In [82]:
# Print the value of Albanias GDP per capita in 1952 using .loc
df_all = pd.read_csv('gapminder_All.csv', index_col='country')
print(df_all.loc['Albania','gdpPercap_1952'])
#df_all



1601.056136


In [83]:
# Print the value of Bulgarias GDP per capita in 1962 using .loc
print(df_all.loc['Bulgaria','gdpPercap_1962'])


4254.337839


---
## EXERCISE
~~~
import pandas
df = pandas.read_csv('data/gapminder_Europe.csv', index_col='country')
~~~

1. Find the Per Capita GDP for Serbia in 2007.
1. Find the Per Capita GDPs for Germany in 1982

In [90]:
# Write your code here
import pandas as pd

df_euro = pd.read_csv('gapminder_Europe.csv', index_col = 'country')
print('Serbia\'s GDP in 2007 was: '+ str(df_euro.loc['Serbia','gdpPercap_2007']))
print('Germany\'s GDP in 1982 was:', df_euro.loc['Germany','gdpPercap_1982']) #this print auto adds space

Serbia's GDP in 2007 was: 9786.534714
Germany's GDP in 1982 was: 22031.532740000002


---
## Slicing Data Frames

### Use `:` on its own to mean all columns or all rows.

*   Just like Python's usual slicing notation, we can print all columns or all rows with `.loc` using the `:`

In [102]:
# Print all of the row values for all years for Egypt
import pandas as pd
df_all3 = pd.read_csv('gapminder_All.csv', index_col='country')
df_all3.loc['Egypt',:]

Unnamed: 0                467
pop               8.02645e+07
continent              Africa
gdpPercap_1952        1418.82
gdpPercap_1957        1458.92
gdpPercap_1962        1693.34
gdpPercap_1967        1814.88
gdpPercap_1972        2024.01
gdpPercap_1977        2785.49
gdpPercap_1982        3503.73
gdpPercap_1987        3885.46
gdpPercap_1992        3794.76
gdpPercap_1997        4173.18
gdpPercap_2002         4754.6
gdpPercap_2007        5581.18
pop2007           8.02645e+07
lifeExp2007            71.338
Name: Egypt, dtype: object

* Would get the same result printing `df.iloc[0]` (without a second index).
* We can also omit the `:` and get all rows.
    * e.g. `df.loc["Albania"]`

In [99]:
df_all3.iloc[0]

Unnamed: 0                 11
pop               3.18899e+07
continent                Asia
gdpPercap_1952        779.445
gdpPercap_1957        820.853
gdpPercap_1962        853.101
gdpPercap_1967        836.197
gdpPercap_1972        739.981
gdpPercap_1977        786.113
gdpPercap_1982        978.011
gdpPercap_1987        852.396
gdpPercap_1992        649.341
gdpPercap_1997        635.341
gdpPercap_2002        726.734
gdpPercap_2007         974.58
pop2007           3.18899e+07
lifeExp2007            43.828
Name: Afghanistan, dtype: object

In [106]:
# Print the GDP per capita for all countries in 1952
df_all3 = pd.read_csv('gapminder_All.csv',index_col='country')
df_all3.loc[:,'gdpPercap_1952']

country
Afghanistan                   779.445314
Albania                      1601.056136
Algeria                      2449.008185
Angola                       3520.610273
Argentina                    5911.315053
Australia                   10039.595640
Austria                      6137.076492
Bahrain                      9867.084765
Bangladesh                    684.244172
Belgium                      8343.105127
Benin                        1062.752200
Bolivia                      2677.326347
Bosnia and Herzegovina        973.533195
Botswana                      851.241141
Brazil                       2108.944355
Bulgaria                     2444.286648
Burkina Faso                  543.255241
Burundi                       339.296459
Cambodia                      368.469286
Cameroon                     1172.667655
Canada                      11367.161120
Central African Republic     1071.310713
Chad                         1178.665927
Chile                        3939.978789
China   

*   Would get the same result printing `df["gdpPercap_1952"]`
*   Also get the same result printing `df.gdpPercap_1952` (since it's a column name)

In [108]:
#df_all3['gdpPercap_1952']
df_all3.gdpPercap_1952

country
Afghanistan                   779.445314
Albania                      1601.056136
Algeria                      2449.008185
Angola                       3520.610273
Argentina                    5911.315053
Australia                   10039.595640
Austria                      6137.076492
Bahrain                      9867.084765
Bangladesh                    684.244172
Belgium                      8343.105127
Benin                        1062.752200
Bolivia                      2677.326347
Bosnia and Herzegovina        973.533195
Botswana                      851.241141
Brazil                       2108.944355
Bulgaria                     2444.286648
Burkina Faso                  543.255241
Burundi                       339.296459
Cambodia                      368.469286
Cameroon                     1172.667655
Canada                      11367.161120
Central African Republic     1071.310713
Chad                         1178.665927
Chile                        3939.978789
China   

---
## EXERCISE:
1. Print out GDP per capita for all countries in asia in 1972



In [109]:
# Write your code here
df_asia = pd.read_csv('gapminder_Asia.csv',index_col='country')
df_asia.gdpPercap_1972

country
Afghanistan              739.981106
Bahrain                18268.658390
Bangladesh               630.233627
Cambodia                 421.624026
China                    676.900092
Hong Kong China         8315.928145
India                    724.032527
Indonesia               1111.107907
Iran                    9613.818607
Iraq                    9576.037596
Israel                 12786.932230
Japan                  14778.786360
Jordan                  2110.856309
Korea Dem. Rep.         3701.621503
Korea Rep.              3030.876650
Kuwait                109347.867000
Lebanon                 7486.384341
Malaysia                2849.094780
Mongolia                1421.741975
Myanmar                  357.000000
Nepal                    674.788130
Oman                   10618.038550
Pakistan                1049.938981
Philippines             1989.374070
Saudi Arabia           24837.428650
Singapore               8597.756202
Sri Lanka               1213.395530
Syria               

---
### We can also use the `:` to select whole sections of a table 
* Similar to the way we would select a section of from a normal python list, we can do the same with data frames.

In [110]:
# Print a selection of all countries in the table from Italy to Poland from years 1962 to 1972
df_all3.loc['Italy':'Poland']

Unnamed: 0_level_0,Unnamed: 0,pop,continent,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007,pop2007,lifeExp2007
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Italy,779,58147733.0,Europe,4931.404155,6248.656232,8243.58234,10022.40131,12269.27378,14255.98475,16537.4835,19207.23482,22013.64486,24675.02446,27968.09817,28569.7197,58147733.0,80.546
Jamaica,791,2780132.0,Americas,2898.530881,4756.525781,5246.107524,6124.703451,7433.889293,6650.195573,6068.05135,6351.237495,7404.923685,7121.924704,6994.774861,7320.880262,2780132.0,72.567
Japan,803,127467972.0,Asia,3216.956347,4317.694365,6576.649461,9847.788607,14778.78636,16610.37701,19384.10571,22375.94189,26824.89511,28816.58499,28604.5919,31656.06806,127467972.0,82.603
Jordan,815,6053193.0,Asia,1546.907807,1886.080591,2348.009158,2741.796252,2110.856309,2852.351568,4161.415959,4448.679912,3431.593647,3645.379572,3844.917194,4519.461171,6053193.0,72.535
Kenya,827,35610177.0,Africa,853.540919,944.438315,896.966373,1056.736457,1222.359968,1267.613204,1348.225791,1361.936856,1341.921721,1360.485021,1287.514732,1463.249282,35610177.0,59.339
Korea Dem. Rep.,839,23301725.0,Asia,1088.277758,1571.134655,1621.693598,2143.540609,3701.621503,4106.301249,4106.525293,4106.492315,3726.063507,1690.756814,1646.758151,1593.06548,23301725.0,70.647
Korea Rep.,851,49044790.0,Asia,1030.592226,1487.593537,1536.344387,2029.228142,3030.87665,4657.22102,5622.942464,8533.088805,12104.27872,15993.52796,19233.98818,23348.13973,49044790.0,78.623
Kuwait,863,2505559.0,Asia,108382.3529,113523.1329,95458.11176,80894.88326,109347.867,59265.47714,31354.03573,28118.42998,34932.91959,40300.61996,35110.10566,47306.98978,2505559.0,77.588
Lebanon,875,3921278.0,Asia,4834.804067,6089.786934,5714.560611,6006.983042,7486.384341,8659.696836,7640.519521,5377.091329,6890.806854,8754.96385,9313.93883,10461.05868,3921278.0,71.993
Lesotho,887,2046772.0,Africa,298.846212,335.997115,411.800627,498.639026,496.581592,745.369541,797.263107,773.993214,977.486272,1186.147994,1275.184575,1569.331442,2046772.0,59.685


Note that in Pandas **slicing using indexes is inclusive at both ends**, which differs from typical python behavior where slicing indicates everything up to but not including the final index.

### Select multiple columns or rows using `DataFrame.iloc` and a named slice.
* We can also make selection from a data frame using the index location of the row or column
    * Remember that in programming languages, we start counting at 0

In [122]:
#Print the first row of the data frame using the .head() function
print("First row of data frame:\n", df_all3.head(1) )

# Use iloc to print the value in the first row of the first column
print("\n\nValue in the first row of the first column:\n", df_all3.iloc[0,0] )

# Use iloc to print the values of the first two columns in the first row
print("Values in the first two columes in the first row:\n", df_all3.iloc[0,0:2] )

First row of data frame:
              Unnamed: 0         pop continent  gdpPercap_1952  gdpPercap_1957  \
country                                                                         
Afghanistan          11  31889923.0      Asia      779.445314       820.85303   

             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972  gdpPercap_1977  \
country                                                                       
Afghanistan       853.10071      836.197138      739.981106       786.11336   

             gdpPercap_1982  gdpPercap_1987  gdpPercap_1992  gdpPercap_1997  \
country                                                                       
Afghanistan      978.011439      852.395945      649.341395      635.341351   

             gdpPercap_2002  gdpPercap_2007     pop2007  lifeExp2007  
country                                                               
Afghanistan      726.734055      974.580338  31889923.0       43.828  


Value in the first row of the first col

* **Note that unlike slicing using column or row names, slicing using indexes is not inclusive** 

---
## EXERCISE:
1. Print out all values from Hungary through Montenegro (Europe data) for the years 1977 through 1997

In [124]:
# Write code here
df_euro.loc['Hungary':'Montenego','gdpPercap_1977':'gdpPercap_1997']

Unnamed: 0_level_0,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Hungary,11674.83737,12545.99066,12986.47998,10535.62855,11712.7768
Iceland,19654.96247,23269.6075,26923.20628,25144.39201,28061.09966
Ireland,11150.98113,12618.32141,13872.86652,17558.81555,24521.94713
Italy,14255.98475,16537.4835,19207.23482,22013.64486,24675.02446


---
## EXERCISE:
1.  Do the two statements below produce the same output?
    ~~~
    print(df.iloc[0:2, 0:2])
    print(df.loc['Albania':'Belgium', 'gdpPercap_1952':'gdpPercap_1962'])
    ~~~

1.  Based on this,what rule governs what is included (or not) in numerical slices and named slices in Pandas?


In [126]:
print(df_all3.iloc[0:2, 0:2])
print('\nnext one coming\n')
print(df_all3.loc['Albania':'Belgium', 'gdpPercap_1952':'gdpPercap_1962'])

             Unnamed: 0         pop
country                            
Afghanistan          11  31889923.0
Albania              23   3600523.0

next one coming

            gdpPercap_1952  gdpPercap_1957  gdpPercap_1962
country                                                   
Albania        1601.056136     1942.284244     2312.888958
Algeria        2449.008185     3013.976023     2550.816880
Angola         3520.610273     3827.940465     4269.276742
Argentina      5911.315053     6856.856212     7133.166023
Australia     10039.595640    10949.649590    12217.226860
Austria        6137.076492     8842.598030    10750.721110
Bahrain        9867.084765    11635.799450    12753.275140
Bangladesh      684.244172      661.637458      686.341554
Belgium        8343.105127     9714.960623    10991.206760


---
### Slicing individual rows and columns
* Instead of creating slices of *this* to *that* using the `:`, we can also slice using individual rows and columns by placing names or indexes in brackets `[]`.

In [127]:
# Print out the GDP per capita of only Italy, Austria, and the United Kingdom in the years 2007 and 1957
df_euro.loc[['Italy','Austria','United Kingdom'],['gdpPercap_1957','gdpPercap_2007']]

Unnamed: 0_level_0,gdpPercap_1957,gdpPercap_2007
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Italy,6248.656232,28569.7197
Austria,8842.59803,36126.4927
United Kingdom,11283.17795,33203.26128


---
## EXERCISE:
1. Using the index locations, `print` out the first, third, and eight columns for the sixteenth through nineteenth rows.

In [131]:
print(df_euro.iloc[[15,16,17,18],[0,2,7]])

             Unnamed: 0 continent  gdpPercap_1972
country                                          
Italy               779    Europe    12269.273780
Montenegro         1019    Europe     7778.414017
Netherlands        1091    Europe    18794.745670
Norway             1151    Europe    18965.055510


---
## Result of slicing can be used in further operations.

In [135]:
# Print out the max value (.max()) of ALL countries from Italy to Poland for 1962 to 1972
print(df_euro.loc['Italy':'Poland','gdpPercap_1962':'gdpPercap_1972'])
print('\nNext One\n')
print(df_euro.loc['Italy':'Poland','gdpPercap_1962':'gdpPercap_1972'].max())

             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
country                                                    
Italy           8243.582340    10022.401310    12269.273780
Montenegro      4649.593785     5907.850937     7778.414017
Netherlands    12790.849560    15363.251360    18794.745670
Norway         13450.401510    16361.876470    18965.055510
Poland          5338.752143     6557.152776     8006.506993

Next One

gdpPercap_1962    13450.40151
gdpPercap_1967    16361.87647
gdpPercap_1972    18965.05551
dtype: float64


In [136]:
# Print out the min value (.min()) of ALL countries from Italy to Poland for 1962 to 1972
print(df_euro.loc['Italy':'Poland','gdpPercap_1962':'gdpPercap_1972'])
print('\nNext One\n')
print(df_euro.loc['Italy':'Poland','gdpPercap_1962':'gdpPercap_1972'].min())

             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
country                                                    
Italy           8243.582340    10022.401310    12269.273780
Montenegro      4649.593785     5907.850937     7778.414017
Netherlands    12790.849560    15363.251360    18794.745670
Norway         13450.401510    16361.876470    18965.055510
Poland          5338.752143     6557.152776     8006.506993

Next One

gdpPercap_1962    4649.593785
gdpPercap_1967    5907.850937
gdpPercap_1972    7778.414017
dtype: float64


*   Usually don't just print a slice.
*   All the statistical operators that work on entire data frames work the same way on slices.

## Create data frame from selections

* We can create new data frame by selecting data frames based on values and assigining it to a variable

In [137]:
# Create a selection of ALL countries from Italy to Poland for 1962 to 1972 and 
#  assign the selection to a variable name "subset_df"
subset_df = df_euro.loc['Italy':'Poland','gdpPercap_1962':'gdpPercap_1972']

print('Subset of data:\n', subset_df)

Subset of data:
              gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
country                                                    
Italy           8243.582340    10022.401310    12269.273780
Montenegro      4649.593785     5907.850937     7778.414017
Netherlands    12790.849560    15363.251360    18794.745670
Norway         13450.401510    16361.876470    18965.055510
Poland          5338.752143     6557.152776     8006.506993


## Create DataFrame using query
* We can query values in a data frame to create new selections
* By passing a dataframe query to itself, we can create a new dataframe with only those values

In [157]:
# Create a query for the "gdpPerCap_1962" series in out subset_df data frame for all values greater than 10000

# Create a new data frame called subset_10k_df by passing that query to the subset_df data frame

query_10k = subset_df[subset_df.gdpPercap_1962>10000]
print(query_10k)
subset_10k_df = pd.DataFrame(query_10k)
print(subset_10k_df)
print(subset_10k_df.shape)

             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
country                                                    
Netherlands     12790.84956     15363.25136     18794.74567
Norway          13450.40151     16361.87647     18965.05551
             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
country                                                    
Netherlands     12790.84956     15363.25136     18794.74567
Norway          13450.40151     16361.87647     18965.05551
(2, 3)


---
## EXERCISE:

* Create three data frames and get the size of each one.
    1. Countries with a gdp per capita in 1952 above 10000
    1. Countries with a gdp per capita in 1962 above 10000
    1. Countries with a gdp per capita in 1972 above 10000

---

In [166]:
subset_1952_72 = df_all3.loc[:,['gdpPercap_1952','gdpPercap_1962','gdpPercap_1972']]
print(subset_1952_72)
q1952_10k = subset_1952_72[subset_1952_72.gdpPercap_1952>10000]
subset_10k_1952 = pd.DataFrame(q1952_10k)
print(subset_10k_1952)
print(subset_10k_1952.shape, "is the size of the new 1952 dataframe")

q1962_10k = subset_1952_72[subset_1952_72.gdpPercap_1962>10000]
subset_10k_1962 = pd.DataFrame(q1962_10k)
print(subset_10k_1962)
print(subset_10k_1962.shape, "is the size of the new 1962 dataframe")

q1972_10k = subset_1952_72[subset_1952_72.gdpPercap_1972>10000]
subset_10k_1972 = pd.DataFrame(q1972_10k)
print(subset_10k_1972)
print(subset_10k_1972.shape, "is the size of the new 1972 dataframe")

                          gdpPercap_1952  gdpPercap_1962  gdpPercap_1972
country                                                                 
Afghanistan                   779.445314      853.100710      739.981106
Albania                      1601.056136     2312.888958     3313.422188
Algeria                      2449.008185     2550.816880     4182.663766
Angola                       3520.610273     4269.276742     5473.288005
Argentina                    5911.315053     7133.166023     9443.038526
Australia                   10039.595640    12217.226860    16788.629480
Austria                      6137.076492    10750.721110    16661.625600
Bahrain                      9867.084765    12753.275140    18268.658390
Bangladesh                    684.244172      686.341554      630.233627
Belgium                      8343.105127    10991.206760    16672.143560
Benin                        1062.752200      949.499064     1085.796879
Bolivia                      2677.326347     2180.9

## Filter a DataFrame using a Boolean mask

* A frame full of Booleans is sometimes called a *mask* because of how it can be used
* Comparison is applied element by element
* Returns a similarly-shaped data frame of `True` and `False`

In [168]:
# Create a full data frame mask for subset_df with a query for all values in all years greater than 10000
mask_10k = subset_df>10000

print( mask_10k )

             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
country                                                    
Italy                 False            True            True
Montenegro            False           False           False
Netherlands            True            True            True
Norway                 True            True            True
Poland                False           False           False


* We can use masks to filter an entire dataframe with a single query
    * More efficient than using a single query on multiple columns

In [172]:
# Pass the mask query to subset_df to create a new data frame mask_subset
mask_subset = subset_df[mask_10k]

print(mask_subset)
print("Shape: ", mask_subset.shape)

             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
country                                                    
Italy                   NaN     10022.40131     12269.27378
Montenegro              NaN             NaN             NaN
Netherlands     12790.84956     15363.25136     18794.74567
Norway          13450.40151     16361.87647     18965.05551
Poland                  NaN             NaN             NaN
Shape:  (5, 3)


*   Returns the value where the mask is true, and NaN (Not a Number) where it is false.
*   Useful because NaNs are ignored by operations like max, min, average, etc.


* If we wanted to remove all rows with a NaN value in any column we could use the `.dropna()` function

In [174]:
# Print the mask_subset data frame with all rows with a single NaN value removed
maskdrop=mask_subset.dropna()
print(maskdrop)

# Print the shape of the mask_subset data frame with all rows with an NaN values removed
print('Shape',maskdrop.shape)

             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
country                                                    
Netherlands     12790.84956     15363.25136     18794.74567
Norway          13450.40151     16361.87647     18965.05551
Shape (2, 3)


## Create new columns

* We can easily create new columns in the same way we would add a key and value to a dictionary

In [180]:
# Create a new column in data frame called diff_07_52 that is the difference between gdp per capita from 1952 to 2007

df_all3['diff_07_52'] = df_all3.gdpPercap_2007-df_all.gdpPercap_1952
#print(df_all3.head(10))
df_all3.loc[:,['gdpPercap_1952','gdpPercap_2007','diff_07_52']]

Unnamed: 0_level_0,gdpPercap_1952,gdpPercap_2007,diff_07_52
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,779.445314,974.580338,195.135024
Albania,1601.056136,5937.029526,4335.973390
Algeria,2449.008185,6223.367465,3774.359280
Angola,3520.610273,4797.231267,1276.620994
Argentina,5911.315053,12779.379640,6868.064587
Australia,10039.595640,34435.367440,24395.771800
Austria,6137.076492,36126.492700,29989.416208
Bahrain,9867.084765,29796.048340,19928.963575
Bangladesh,684.244172,1391.253792,707.009620
Belgium,8343.105127,33692.605080,25349.499953


---
## EtherPad

On EtherPad explain what the follow expression does:

    only_Am = df[df['continent'] == 'Americas']

___

## EXERCISE:
1. Explain in simple terms what `idxmin` and `idxmax` do in the short program below.
    ~~~
    df = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
    print(df.idxmin())
    print(df.idymax())
    ~~~

2. When would you use these methods?

In [194]:
print(df_all3.diff_07_52.idxmin())
print(df_all3.diff_07_52.idxmax())
df_all3.to_csv('gdpDiff.csv')
#print(df_all3.country.idymax())

Kuwait
Singapore


---
## PRACTICE EXERCISE.
Using the Gapminder GDP data for Europe, write an expression to select each of the following:
1.  GDP per capita for all countries in 1982.
1.  GDP per capita for Denmark for all years.
1.  GDP per capita for all countries for years *after* 1985.
1.  GDP per capita for each country in 2007 as a multiple of GDP per capita for that country in 1952.
---

# -- COMMIT YOUR WORK TO GITHUB --

---
## Keypoints:
 - "Use `DataFrame.iloc[..., ...]` to select values by index location."
 - "Use `:` on its own to mean all columns or all rows."
 - "Select multiple columns or rows using `DataFrame.ix` and a named slice."
 - "Result of slicing can be used in further operations."
 - "Use comparisons to select data based on value."
 - "Select values or NaN using a Boolean mask."