# Table of Contents
 <p><div class="lev1"><a href="#Challenge:-Summarizing-Data"><span class="toc-item-num">1&nbsp;&nbsp;</span>Challenge: Summarizing Data</a></div>

Challenge: Summarizing Data
===========================

The American Community Survey is a U.S. Census Bureau survey that collects data on everything from housing affordability to industry employment rates. For this challenge, you'll be using the data that the team at FiveThirtyEight derived from the 2010-2012 American Community Surveys. FiveThirtyEight cleaned the data set and made it available in a Github repository.

Here's a quick overview of the files we'll be working with:

- **all-ages.csv** - Employment data by major for all ages
- **recent-grads.csv** - Employment data by major for recent college graduates only

Here are descriptions for a few of the columns (out of 21 total columns):

- **Rank** - The major's numerical rank, by post-graduation median earnings
- **Major_code** - The major's numerical code
- **Major** - The major's description
- **Major_category** - The major's category
- **Total** - The total number of people who studied the major
- **Men** - The number of men who studied the major
- **Women** - The number of women who studied the major
- **ShareWomen** - The share of women (from 0 to 1) who studied the major
- **Employed** - The number of people who studied the major and obtained a job after graduating


Here are the first few rows and columns in recent-grads.csv. The data set all-ages.csv has the same structure, but with different values for some of the columns:

In [2]:
from IPython.core.display import display, HTML
display(HTML('<table class="table table-bordered"> <thead><tr> <th>Rank</th> <th>Major_code</th> <th>Major</th> <th>Major_category</th> <th>Total</th> <th>Sample_size</th> <th>Men</th> <th>Women</th> <th>ShareWomen</th> <th>Employed</th> </tr> </thead> <tbody> <tr> <td>1</td> <td>2419</td> <td>PETROLEUM ENGINEERING</td> <td>Engineering</td> <td>2339</td> <td>36</td> <td>2057</td> <td>282</td> <td>0.120564</td> <td>1976</td> </tr> <tr> <td>2</td> <td>2416</td> <td>MINING AND MINERAL ENGINEERING</td> <td>Engineering</td> <td>756</td> <td>7</td> <td>679</td> <td>77</td> <td>0.101852</td> <td>640</td> </tr> <tr> <td>3</td> <td>2415</td> <td>METALLURGICAL ENGINEERING</td> <td>Engineering</td> <td>856</td> <td>3</td> <td>725</td> <td>131</td> <td>0.153037</td> <td>648</td> </tr> <tr> <td>4</td> <td>2417</td> <td>NAVAL ARCHITECTURE AND MARINE ENGINEERING</td> <td>Engineering</td> <td>1258</td> <td>16</td> <td>1123</td> <td>135</td> <td>0.107313</td> <td>758</td> </tr> <tr> <td>5</td> <td>2405</td> <td>CHEMICAL ENGINEERING</td> <td>Engineering</td> <td>32260</td> <td>289</td> <td>21239</td> <td>11021</td> <td>0.341631</td> <td>25694 </td> </tr> </tbody> </table>'))

Rank,Major_code,Major,Major_category,Total,Sample_size,Men,Women,ShareWomen,Employed
1,2419,PETROLEUM ENGINEERING,Engineering,2339,36,2057,282,0.120564,1976
2,2416,MINING AND MINERAL ENGINEERING,Engineering,756,7,679,77,0.101852,640
3,2415,METALLURGICAL ENGINEERING,Engineering,856,3,725,131,0.153037,648
4,2417,NAVAL ARCHITECTURE AND MARINE ENGINEERING,Engineering,1258,16,1123,135,0.107313,758
5,2405,CHEMICAL ENGINEERING,Engineering,32260,289,21239,11021,0.341631,25694


<div class="alert alert-info">
    <ul>
        <li>Read all-ages.csv into a DataFrame object, and assign it to all_ages.</li>
        <li>Read recent-grads.csv into a DataFrame object, and assign it to recent_grads.</li>
        <li>Display the first five rows of all_ages and recent_grads.</li>
    </ul>
</div>

In [1]:
import pandas as pd

In [3]:
all_ages = pd.read_csv("all-ages.csv")

In [6]:
print(all_ages.head(5))

   Major_code                                  Major  \
0        1100                    GENERAL AGRICULTURE   
1        1101  AGRICULTURE PRODUCTION AND MANAGEMENT   
2        1102                 AGRICULTURAL ECONOMICS   
3        1103                        ANIMAL SCIENCES   
4        1104                           FOOD SCIENCE   

                    Major_category   Total  Employed  \
0  Agriculture & Natural Resources  128148     90245   
1  Agriculture & Natural Resources   95326     76865   
2  Agriculture & Natural Resources   33955     26321   
3  Agriculture & Natural Resources  103549     81177   
4  Agriculture & Natural Resources   24280     17281   

   Employed_full_time_year_round  Unemployed  Unemployment_rate  Median  \
0                          74078        2423           0.026147   50000   
1                          64240        2266           0.028636   54000   
2                          22810         821           0.030248   63000   
3                         

In [5]:
recent_grads = pd.read_csv("recent-grads.csv")

In [7]:
print(recent_grads.head(5))

   Rank  Major_code                                      Major Major_category  \
0     1        2419                      PETROLEUM ENGINEERING    Engineering   
1     2        2416             MINING AND MINERAL ENGINEERING    Engineering   
2     3        2415                  METALLURGICAL ENGINEERING    Engineering   
3     4        2417  NAVAL ARCHITECTURE AND MARINE ENGINEERING    Engineering   
4     5        2405                       CHEMICAL ENGINEERING    Engineering   

   Total  Sample_size    Men  Women  ShareWomen  Employed      ...        \
0   2339           36   2057    282    0.120564      1976      ...         
1    756            7    679     77    0.101852       640      ...         
2    856            3    725    131    0.153037       648      ...         
3   1258           16   1123    135    0.107313       758      ...         
4  32260          289  21239  11021    0.341631     25694      ...         

   Part_time  Full_time_year_round  Unemployed  Unemploy