In [2]:
#######################
# standard code block #
#######################

% pylab inline
# see https://ipython.readthedocs.io/en/stable/interactive/magics.html

%config InlineBackend.figure_format = 'svg'

UsageError: Line magic function `%` not found.


# Introduction to Pandas

In [4]:
import pandas as pd
import numpy as np

## From the Pandas Documentation:

Here are just a few of the things that pandas does well:

- Easy handling of **missing data** (represented as NaN) in floating point as well as non-floating point data
- Size mutability: columns can be **inserted and deleted** from DataFrame and higher dimensional objects
- Automatic and explicit **data alignment**: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations
- Powerful, flexible **group by** functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
- Make it **easy to convert** ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
- Intelligent **label-based slicing**, **fancy indexing**, and **subsetting** of large data sets
- Intuitive **merging** and **joining** data sets
- Flexible **reshaping** and **pivoting** of data sets
- **Hierarchical labeling** of axes (possible to have multiple labels per tick)
- **Robust IO tools** for loading data from flat files (CSV and delimited), Excel files, databases, and saving / loading data from the ultrafast HDF5 format
- **Time series**-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.

### 10 Minutes Intro to Pandas ###

Pandas has an official 10 minute intro.

http://pandas.pydata.org/pandas-docs/stable/10min.html

## Set Up Pandas Default Parameters

In [5]:
# if you run into trouble, it's often helpful to know which version you're on
print("Pandas version:", pd.__version__)
print("Numpy version:", np.__version__)

Pandas version: 1.2.4
Numpy version: 1.20.1


In [6]:
# various options in pandas
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 25)
pd.set_option('display.precision', 3)

## Data Structures

### 1. Series

One Dimensional Array / Vector of Values (Think these as your data columns).  One important aspect of them is that they carry an "index" (which you can think of as a row indicator).

### 2. DataFrames

Think of DataFrame as a Table with Columns.  This is the workhorse of everything you will do with data analysis.  Learning Pandas and its functions can be challenging, but stick with it and ask questions.  Structurally, a DataFrame can be thought of as a collection of Series objects with the same index.

In [52]:
import pandas as pd 
import numpy as np


### 3. [Panel Data](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Panel.html)

Three Dimensional Arrays  (Mentioned for reference, but we will not get much into these)

## So, What is a Pandas DataFrame?

In [53]:
pd.Series?

In [58]:
## Make a Series
s = pd.Series([1, 3, 5, np.nan, 6, 8])
L =pd.Series(index=[24,78],data=["Hi","j"])
L

# o=pd.Series(data=2,index=["School","University"])
# o["School"]
# print(s)

24    Hi
78     j
dtype: object

In [10]:
# pd.DataFrame?
import numpy 
numpy.random?

In [11]:
numpy.random.random_integers?

In [12]:
## Make a dataframe from a numpy array
import pandas as pd
import numpy
from numpy.random import random_integers
df1 = pd.DataFrame(random_integers(2,4,size=(1,4)),columns={'A','B','C','D'})
df1
# type({'A','B','C','D'})
# index=["Hind","Mai","Nourse","Fai"]

  df1 = pd.DataFrame(random_integers(2,4,size=(1,4)),columns={'A','B','C','D'})


Unnamed: 0,A,C,B,D
0,3,2,4,4


In [13]:
## Make a dataframe from a dictionary
df= pd.DataFrame({
    'A': 1.,
    'B': pd.Timestamp('20130102'),
    'C': pd.Series(9, index=list(range(3))),
    'D': np.array([3] * 3),
    'E': pd.Categorical(["test", "train", "test"]),
    'F': 'foo'
})
dfs = pd.DataFrame({
    'Name': "Hind Alamri",
    'brirthday': pd.Timestamp('20130102'),
    'talent':['music','math'],
    'index' :['A','B']
 })
dfs
# pd.array?
# i=[3] * 4
# i
# pd.Series(data=9, index=list(range(4)))

Unnamed: 0,Name,brirthday,talent,index
0,Hind Alamri,2013-01-02,music,A
1,Hind Alamri,2013-01-02,math,B


## Load a Data Set

### [Census Income Dataset](http://archive.ics.uci.edu/ml/datasets/Census+Income)

pandas can load a lot more than csvs, this tutorial shows how pandas can read excel, sql, and even copy and paste...
http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/

In [14]:
# download the data and name the columns
cols = [
    'age', 'workclass', 'fnlwgt', 'education', 'education_num',
    'marital_status', 'occupation', 'relationship', 'ethnicity', 'gender',
    'capital_gain', 'capital_loss', 'hours_per_week', 'country_of_origin',
    'income'
]

df11 = pd.read_csv(
    'http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data',
    names=cols)


In [15]:
df11.head(32566)


Unnamed: 0,age,workclass,fnlwgt,education,education_num,marital_status,occupation,relationship,ethnicity,gender,capital_gain,capital_loss,hours_per_week,country_of_origin,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,27,Private,257302,Assoc-acdm,12,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9,Never-married,Adm-clerical,Own-child,White,Male,0,0,20,United-States,<=50K


### Q: What's happening in the above cell?

In [16]:
df11.info()
df11.tail()
# .columns
# .values
# .dtype

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   age                32561 non-null  int64 
 1   workclass          32561 non-null  object
 2   fnlwgt             32561 non-null  int64 
 3   education          32561 non-null  object
 4   education_num      32561 non-null  int64 
 5   marital_status     32561 non-null  object
 6   occupation         32561 non-null  object
 7   relationship       32561 non-null  object
 8   ethnicity          32561 non-null  object
 9   gender             32561 non-null  object
 10  capital_gain       32561 non-null  int64 
 11  capital_loss       32561 non-null  int64 
 12  hours_per_week     32561 non-null  int64 
 13  country_of_origin  32561 non-null  object
 14  income             32561 non-null  object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB


Unnamed: 0,age,workclass,fnlwgt,education,education_num,marital_status,occupation,relationship,ethnicity,gender,capital_gain,capital_loss,hours_per_week,country_of_origin,income
32556,27,Private,257302,Assoc-acdm,12,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9,Never-married,Adm-clerical,Own-child,White,Male,0,0,20,United-States,<=50K
32560,52,Self-emp-inc,287927,HS-grad,9,Married-civ-spouse,Exec-managerial,Wife,White,Female,15024,0,40,United-States,>50K


## Viewing Data

* .info() 
* .head()
* .tail()
* .columns
* .values
* .dtype

### info

Displays the Columns, Types, Rows and the memory used by the dataframe

In [17]:
# we can see there are no null values
# columns with numerical values are type int64, no need to set data type

df.info(6)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   A       3 non-null      float64       
 1   B       3 non-null      datetime64[ns]
 2   C       3 non-null      int64         
 3   D       3 non-null      int64         
 4   E       3 non-null      category      
 5   F       3 non-null      object        
dtypes: category(1), datetime64[ns](1), float64(1), int64(2), object(1)
memory usage: 271.0+ bytes


### head

Displays the first few rows in the dataframe

In [18]:
# to view the first 5 or specify with ex: .head(10)
df.head(10)

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,9,3,test,foo
1,1.0,2013-01-02,9,3,train,foo
2,1.0,2013-01-02,9,3,test,foo


### tail

Displays the last few rows in the dataframe

In [19]:
df.tail(10)

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,9,3,test,foo
1,1.0,2013-01-02,9,3,train,foo
2,1.0,2013-01-02,9,3,test,foo


### sample

Displays a sample of rows in the dataframe

In [20]:
# head and tail are good.  But sometimes we want to randomly sample data
df.sample(1, random_state=19160)
# ###### # ###### ###### ###### ###### #####

Unnamed: 0,A,B,C,D,E,F
1,1.0,2013-01-02,9,3,train,foo


### Q: What do you expect to happen when you re-run the cell?

What actually happens? Why?

### Columns

Returns a list of columns in the dataframe

In [21]:
# view all columns of the dataframe
df.columns

Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')

### Column Types

Returns the type of each column

In [22]:
df.dtypes

A           float64
B    datetime64[ns]
C             int64
D             int64
E          category
F            object
dtype: object

## Rename Columns

In [33]:
df.columns

Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')

In [34]:
# replace a column name
df.rename(columns={'age':'Hi'} ,inplace=False)
df.rename(columns={'fnlwgt':'Hi'} ,inplace=True)

df.head()


Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,9,3,test,foo
1,1.0,2013-01-02,9,3,train,foo
2,1.0,2013-01-02,9,3,test,foo


### Q: What does `in_place` do above?

In [35]:
df11.inplace

AttributeError: 'DataFrame' object has no attribute 'inplace'

## Descriptives 

* .describe()
* .value_counts()
* .mean()
* .unique()

### describe

Displays summary statistic for each numerical column

In [36]:
df.describe()

SyntaxError: invalid syntax (<ipython-input-36-ad4139a3aeaa>, line 1)

### value_counts

Counts the number of occurrences of each categorical value for the column

In [26]:
df= pd.DataFrame({
    'A': 1.,
    'B': pd.Timestamp('20130102'),
    'C': pd.Series(9, index=list(range(3))),
    'D': np.array([3] * 3),
    'E': pd.Categorical(["test", "train", "test"]),
    'F': 'foo'
})


In [30]:
df['E']

0     test
1    train
2     test
Name: E, dtype: category
Categories (2, object): ['test', 'train']

In [None]:
df["G"]=[1,1,1]

# df.education
# Is the same df['education']  and df.education..

In [39]:
df

Unnamed: 0,A,B,C,D,E,F,G
0,1.0,2013-01-02,9,3,test,foo,1
1,1.0,2013-01-02,9,3,train,foo,1
2,1.0,2013-01-02,9,3,test,foo,1


In [34]:
type(df.B)

pandas.core.series.Series

In [37]:
df.C.value_counts()

## Also works for numeric columns - treating the individual values as factors

9    3
Name: C, dtype: int64

In [None]:
type(df.education.value_counts())
#it's count the value.

In [None]:
df.education.value_counts().plot(kind='line')

In [None]:
df.plot

In [None]:
df.hours_per_week.mean()

# Can also do:
df['hours_per_week'].mean()

### Q: What do you think we will get if we ask for the `type` of `df.hours_per_week` ?


In [None]:
type(df['hours_per_week'])

In [None]:
df['hours_per_week']

### Unique

Returns the unique values for the column

In [None]:
# there's a space before each string in this data
df.education.unique()
df

In [None]:
# looks like it's in every object column
df.workclass.unique()

In [None]:
df["education"] = df.education.str.strip()

In [None]:
# Hurray We removed the leading space
df.education.unique()

In [None]:
df.gender.unique()

In [None]:
# Remove leading space in values
df["gender"] = df.gender.str.strip()

In [None]:
df.gender.unique()

## Selecting Rows and Columns 

### .loc 

* Selects row and columns by Names
* **by label**             `.loc[]`

### .iloc

* Selects row and columns by Index Position
* **by integer position**  `.iloc[]`

http://pandas.pydata.org/pandas-docs/stable/indexing.html

In [None]:
# select a row
# df.iloc[3]
df1.loc?
#  3 is the index the possition
# its return rows number 3.

In [None]:
dfs
dfs.iloc[0]
## Note: I got 3-first rows returned, similar to the indexing that applies to Python lists

In [None]:
dfs.describe(datetime_is_numeric=False)

In [None]:
dfs

In [None]:
# select a range of rows
df.iloc[10:15]

In [None]:
# last 2 rows
df.iloc[::-2]
# 32561 rows × 15 columns



In [None]:
# selecting every other row in columns 3-5
df.iloc[1:3]
# [::2, 2:5] the step of 2 between 2 and 5

In [None]:
# select a row
df.iloc[7,::-3]
# 3:: >> يبدأ من السطر الثالث الى النهاية
# ::3 >> يمشي ثلاث خطوات من بداية حتى النهاية

In [None]:
df.iloc[3]

### Q: Why did I get 4 rows above here instead of 3?





Integers vs. labels!

In [None]:
df.loc[(3,'Hi')]=3

In [None]:
df.iloc[0:4,0:6]
# df.iloc[اعمده ، صفوف]
# df.iloc[0:3,2:6]

## Filtering

In [None]:
(df.Hind > 50)
df.relationship

In [None]:
asd = df[df.Hind > 50].head(5)
asd

In [None]:
# Filter for only certain Columns
df.loc[df.Hind > 50, ['Hind', 'education', 'occupation', 'gender', 'income']]
# df.iloc[df.age > 50, ['age', 'education', 'occupation', 'gender', 'income']] Error  (the idex of row or the colum)

# What happens if I try to do the same with df.iloc instead of df.loc?


In [577]:
df["relationship"]

0         Not-in-family
1               Husband
2         Not-in-family
3               Husband
4                  Wife
              ...      
32556              Wife
32557           Husband
32558         Unmarried
32559         Own-child
32560              Wife
Name: relationship, Length: 32561, dtype: object

## Now Filter on Gender

In [443]:
df.gender == 'Male'

0        False
1        False
2        False
3        False
4        False
         ...  
32556    False
32557    False
32558    False
32559    False
32560    False
Name: gender, Length: 32561, dtype: bool

In [464]:
df.relationship == 'Not-in-family' 

0        False
1        False
2        False
3        False
4        False
         ...  
32556    False
32557    False
32558    False
32559    False
32560    False
Name: relationship, Length: 32561, dtype: bool

## Now Filter on Gender and Age Between 30 and 40

In [591]:
m=(df.gender == 'Male') & (df.Hind >= 30) & (df.Hind <= 40)
df[m]

Unnamed: 0,Hind,workclass,Hi,education,education_num,marital_status,occupation,relationship,ethnicity,gender,capital_gain,capital_loss,hours_per_week,native_country,income


In [467]:
(df.Hind >= 30) & (df.gender == 'Male')

df.loc[(df. >= 66) & (df.gender == 'Male') & (df.Hind <= 40), :]

Unnamed: 0,Hind,workclass,Hi,education,education_num,marital_status,occupation,relationship,ethnicity,gender,capital_gain,capital_loss,hours_per_week,native_country,income


## Find Nulls

In [473]:
# as we saw with df.info() there are no nulls...
# but if there were this would find the rows where age is null
df[df.Hind.isnull()]

Unnamed: 0,Hind,workclass,Hi,education,education_num,marital_status,occupation,relationship,ethnicity,gender,capital_gain,capital_loss,hours_per_week,native_country,income


## Fill Nulls

In [38]:
null_df = pd.DataFrame([1, 2, 4, np.nan], columns=['column1'])

In [39]:
null_df

Unnamed: 0,column1
0,1.0
1,2.0
2,4.0
3,


In [40]:
# you can also fill nulls with a value or string
null_df.column1.fillna(1000)

0       1.0
1       2.0
2       4.0
3    1000.0
Name: column1, dtype: float64

In [41]:
# fillna does not do it inplace unless you specify
null_df

Unnamed: 0,column1
0,1.0
1,2.0
2,4.0
3,


In [42]:
# you can also fill null with the median or mean value of the column
null_df.fillna(null_df.column1.median(), inplace=True)
null_df

Unnamed: 0,column1
0,1.0
1,2.0
2,4.0
3,2.0


In [43]:
null_df.fillna('random_string')

Unnamed: 0,column1
0,1.0
1,2.0
2,4.0
3,2.0


## Drop Nulls

In [50]:
null_df = pd.DataFrame([np.nan, 2, 4, np.nan], columns=['column1'])
null=pd.DataFrame([1,2,3,NaN,5],columns=["Hind"])
null


NameError: name 'NaN' is not defined

In [45]:
w=null_df.dropna(inplace=True,axis=0)
w

In [46]:
# .isnull() and .notnull() do opposite things
null_df.isnull()

Unnamed: 0,column1
1,False
2,False


In [47]:
null_df.notnull()

Unnamed: 0,column1
1,True
2,True


## Groupby

In [532]:
df.groupby('relationship').count()


Unnamed: 0_level_0,Hind,workclass,Hi,education,education_num,marital_status,occupation,ethnicity,gender,capital_gain,capital_loss,hours_per_week,native_country,income
relationship,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Husband,13193,13193,13193,13193,13193,13193,13193,13193,13193,13193,13193,13193,13193,13193
Not-in-family,8305,8305,8305,8305,8305,8305,8305,8305,8305,8305,8305,8305,8305,8305
Other-relative,981,981,981,981,981,981,981,981,981,981,981,981,981,981
Own-child,5068,5068,5068,5068,5068,5068,5068,5068,5068,5068,5068,5068,5068,5068
Unmarried,3446,3446,3446,3446,3446,3446,3446,3446,3446,3446,3446,3446,3446,3446
Wife,1568,1568,1568,1568,1568,1568,1568,1568,1568,1568,1568,1568,1568,1568


In [533]:
df

Unnamed: 0,Hind,workclass,Hi,education,education_num,marital_status,occupation,relationship,ethnicity,gender,capital_gain,capital_loss,hours_per_week,native_country,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,27,Private,257302,Assoc-acdm,12,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9,Never-married,Adm-clerical,Own-child,White,Male,0,0,20,United-States,<=50K


In [535]:
# How to groupby column and apply a function like sum, count, or mean
df.groupby(['education']).mean()

Unnamed: 0_level_0,Hind,Hi,education_num,capital_gain,capital_loss,hours_per_week
education,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
10th,37.43,196832.465,6.0,404.574,56.846,37.053
11th,32.356,194928.077,7.0,215.098,50.079,33.926
12th,32.0,199097.508,8.0,284.088,32.337,35.781
1st-4th,46.143,239303.0,2.0,125.875,48.327,38.256
5th-6th,42.886,232448.333,3.0,176.021,68.252,38.898
7th-8th,48.446,188079.172,4.0,233.94,65.669,39.367
9th,41.06,202485.066,5.0,342.089,28.998,38.045
Assoc-acdm,37.381,193424.094,12.0,640.399,93.419,40.504
Assoc-voc,38.554,181936.017,11.0,715.051,72.755,41.611
Bachelors,38.905,188055.915,13.0,1756.3,118.35,42.614


In [545]:
df.groupby(['education'['Hind']).mean()
# تجميع القيم 

TypeError: unhashable type: 'list'

In [510]:
# To groupby multiple columns with multiple functions attached
df.groupby(['Hind', 'education']).Hind.agg(['count', 'mean'])
# grouped in order of which column is listed first

Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean
Hind,education,Unnamed: 2_level_1,Unnamed: 3_level_1
17,10th,138,17
17,11th,180,17
17,12th,37,17
17,5th-6th,1,17
17,7th-8th,3,17
...,...,...,...
90,Bachelors,9,90
90,HS-grad,14,90
90,Masters,4,90
90,Prof-school,1,90


In [607]:
df.columns

In [548]:
# can use the aggs function to aggregate columns separately
gb = df.groupby(['income', 'native_country'])
gb_aggs = gb.agg({'Hind': 'mean', 'capital_gain': 'sum'})
gb_aggs.sample(9)

Unnamed: 0_level_0,Unnamed: 1_level_0,Hind,capital_gain
income,native_country,Unnamed: 2_level_1,Unnamed: 3_level_1
>50K,Mexico,40.485,140461
<=50K,Trinadad&Tobago,41.176,0
>50K,Haiti,48.0,0
<=50K,Taiwan,29.323,2202
<=50K,United-States,36.817,3323423
<=50K,Canada,41.012,9077
>50K,France,40.167,13401
>50K,Columbia,53.5,0
>50K,Yugoslavia,40.167,5556


In [549]:
# combine groupby with boolean
df[df.native_country == ' United-States'].groupby(
    ['education']).hours_per_week.mean()

education
 10th            36.915
 11th            33.682
 12th            34.951
 1st-4th         32.913
 5th-6th         36.979
 7th-8th         39.060
 9th             38.035
 Assoc-acdm      40.657
 Assoc-voc       41.633
 Bachelors       42.709
 Doctorate       47.409
 HS-grad         40.596
 Masters         44.169
 Preschool       28.118
 Prof-school     47.484
 Some-college    38.862
Name: hours_per_week, dtype: float64

## Sort
* ### sort_index() to sort by index
* ### sort_values() to sort by values