
# Practice Pandas Library
## Introduction
 This notebook will guide you through the fundamentals of the pandas library, a powerful tool for data manipulation and analysis in Python. By the end of this tutorial, you'll be equipped with the skills to handle, analyze, and visualize data effectively using pandas.
## What is pandas?

**pandas** is arguably the most important Python package for data analysis. With over 100 million downloads per month, it is the de facto standard package for data manipulation and exploratory data analysis. Its ability to read from and write to an extensive list of formats makes it a versatile tool for data science practitioners. Its data manipulation functions make it a highly accessible and practical tool for aggregating, analyzing, and cleaning data.

In our blog post on how to learn pandas, we discussed the learning path you may take to master this package. This beginner-friendly tutorial will cover all the basic concepts and illustrate pandas' different functions. You can also check out our course on pandas Foundations for further details.

This article is aimed at beginners with basic knowledge of Python and no prior experience with pandas to help you get started.

---

## What is pandas used for?

pandas is a data manipulation package in Python for tabular data—that is, data in the form of rows and columns, also known as DataFrames. Intuitively, you can think of a DataFrame as an Excel sheet.

pandas’ functionality includes data transformations, like sorting rows and taking subsets, calculating summary statistics such as the mean, reshaping DataFrames, and joining DataFrames together. pandas works well with other popular Python data science packages, often called the PyData ecosystem, including:

- **NumPy** for numerical computing
- **Matplotlib**, **Seaborn**, **Plotly**, and other data visualization packages
- **scikit-learn** for machine learning

---

## Key benefits of the pandas package

Undoubtedly, pandas is a powerful data manipulation tool packaged with several benefits, including:

- **Made for Python**: Python is the world's most popular language for machine learning and data science.
- **Less verbose per unit operation**: Code written in pandas is less verbose, requiring fewer lines of code to get the desired output.
- **Intuitive view of data**: pandas offers exceptionally intuitive data representation that facilitates easier data understanding and analysis.
- **Extensive feature set**: It supports an extensive set of operations from exploratory data analysis, dealing with missing values, calculating statistics, visualizing univariate and bivariate data, and much more.
- **Works with large data**: pandas handles large data sets with ease. It offers speed and efficiency while working with datasets of the order of millions of records and hundreds of columns, depending on the machine.

---

## How to install pandas?

Before delving into its functionality, let us first install pandas. You can avoid this step by registering for a free DataCamp account and using DataLab, a cloud-based IDE that comes with pandas (alongside the top Python data science packages) pre-installed.

### Install pandas

Installing pandas is straightforward; just use the `pip install` command in your terminal.

In [None]:
!pip install pandas



------------------------------------------------------------------------

## Importing data in pandas

To begin working with pandas, import the pandas Python package as shown
below. When importing pandas, the most common alias for pandas is `pd`.


In [None]:
import pandas as pd

##Series
The first main data type we will learn about for pandas is the Series data type.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.


### Index and Data Lists

We can create a Series from Python lists (also from NumPy arrays)

In [None]:
myindex = ['USA','Canada','Mexico','Pakistan']

In [None]:
mydata = [[1776,1867,1821,1947],[1,2,3,4]]

In [None]:
myser = pd.Series(data=mydata ,index=myindex)

ValueError: Length of values (2) does not match length of index (4)

In [None]:
myser

Unnamed: 0,0
USA,
Canada,
Mexico,
Pakistan,


In [None]:
pd.Series(data=mydata,index=myindex)

ValueError: Length of values (2) does not match length of index (4)

### From a  Dictionary

In [None]:
ages = {'Sammy':5,'Frank':10,'Spike':7}

In [None]:
ages

{'Sammy': 5, 'Frank': 10, 'Spike': 7}

In [None]:
pd.Series(ages)

Unnamed: 0,0
Sammy,5
Frank,10
Spike,7


### Creating a DataFrame from Python Objects

In [None]:
# help(pd.DataFrame)

In [None]:
import numpy as np

In [None]:
# Make sure the seed is in the same cell as the random call
# https://stackoverflow.com/questions/21494489/what-does-numpy-random-seed0-do
np.random.seed(101)
mydata = np.random.randint(0,101,(4,4))

In [None]:
mydata

array([[95, 11, 81, 70],
       [63, 87, 75,  9],
       [77, 40,  4, 63],
       [40, 60, 92, 64]])

In [None]:
myindex = ['CA','NY','AZ','TX']

In [None]:
mycolumns = ['Jan','Feb','Mar','Apr']

In [None]:
df = pd.DataFrame(data=mydata,index=mycolumns)
df

Unnamed: 0,0,1,2,3
Jan,95,11,81,70
Feb,63,87,75,9
Mar,77,40,4,63
Apr,40,60,92,64


ValueError: Could not interpret value `0` for `x`. An entry with this name does not appear in `data`.

from matplotlib import pyplot as plt
import seaborn as sns
figsize = (12, 1.2 * len(_df_12['index'].unique()))
plt.figure(figsize=figsize)
sns.violinplot(_df_12, x='0', y='index', inner='stick', palette='Dark2')
sns.despine(top=True, right=True, bottom=True, left=True)

ValueError: Could not interpret value `1` for `x`. An entry with this name does not appear in `data`.

from matplotlib import pyplot as plt
import seaborn as sns
figsize = (12, 1.2 * len(_df_13['index'].unique()))
plt.figure(figsize=figsize)
sns.violinplot(_df_13, x='1', y='index', inner='stick', palette='Dark2')
sns.despine(top=True, right=True, bottom=True, left=True)

ValueError: Could not interpret value `2` for `x`. An entry with this name does not appear in `data`.

from matplotlib import pyplot as plt
import seaborn as sns
figsize = (12, 1.2 * len(_df_14['index'].unique()))
plt.figure(figsize=figsize)
sns.violinplot(_df_14, x='2', y='index', inner='stick', palette='Dark2')
sns.despine(top=True, right=True, bottom=True, left=True)

ValueError: Could not interpret value `3` for `x`. An entry with this name does not appear in `data`.

from matplotlib import pyplot as plt
import seaborn as sns
figsize = (12, 1.2 * len(_df_15['index'].unique()))
plt.figure(figsize=figsize)
sns.violinplot(_df_15, x='3', y='index', inner='stick', palette='Dark2')
sns.despine(top=True, right=True, bottom=True, left=True)

In [None]:
df = pd.DataFrame(data=mydata,index=myindex,columns=mycolumns)
df

Unnamed: 0,Jan,Feb,Mar,Apr
CA,95,11,81,70
NY,63,87,75,9
AZ,77,40,4,63
TX,40,60,92,64


In [None]:
df = pd.DataFrame(data=mydata,index=myindex,columns=mycolumns)
df

Unnamed: 0,Jan,Feb,Mar
CA,95,11,81
NY,70,63,87
AZ,75,9,77
TX,40,4,63


### generating Sythetic data with Pandas


In [None]:
import pandas as pd
import numpy as np

# Set random seed for reproducibility
np.random.seed(42)

# Number of samples
num_samples = 100

# Generate synthetic data
data = {
    'CustomerID': range(1, num_samples + 1),
    'Gender': np.random.choice(['Male', 'Female'], size=num_samples),
    'Age': np.random.randint(18, 70, size=num_samples),
    'Geography': np.random.choice(['France', 'Spain', 'Germany'], size=num_samples),
    'Tenure': np.random.randint(0, 10, size=num_samples),
    'Balance': np.round(np.random.uniform(0, 250000, size=num_samples), 2),
    'NumOfProducts': np.random.randint(1, 5, size=num_samples),
    'HasCrCard': np.random.choice([0, 1], size=num_samples),
    'IsActiveMember': np.random.choice([0, 1], size=num_samples),
    'EstimatedSalary': np.round(np.random.uniform(0, 200000, size=num_samples), 2),
    'Churn': np.random.choice(['Yes', 'No'], size=num_samples, p=[0.2, 0.8])
}

df = pd.DataFrame(data)

# Introduce missing values (10% missing in 'Age', 'Balance', 'EstimatedSalary')
for col in ['Age', 'Balance', 'EstimatedSalary']:
   df.loc[df.sample(frac=0.50).index, col] = np.nan  # 10% missing

# Display the first few rows
df.head(10)
#df.tail(2)



Unnamed: 0,CustomerID,Gender,Age,Geography,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Churn
0,1,Male,35.0,Germany,8,89661.7,1,0,0,16549.68,No
1,2,Female,,Germany,8,,1,1,0,,No
2,3,Male,,Germany,1,,1,1,0,110317.57,No
3,4,Male,51.0,Spain,6,,3,0,0,32966.85,Yes
4,5,Male,27.0,Spain,9,212167.45,3,1,1,,No
5,6,Female,,Spain,2,34155.33,1,0,1,,Yes
6,7,Male,,France,6,,4,1,0,96074.02,No
7,8,Male,48.0,Spain,9,,3,0,0,197057.21,Yes
8,9,Male,65.0,France,8,74127.54,1,0,1,,No
9,10,Female,,France,3,,1,0,0,149915.66,No


In [None]:

df.to_csv('synthetic.csv',index=False)



### Understanding File Paths

You have two options when reading a file with pandas:

1. If your .py file or .ipynb notebook is located in the **exact** same folder location as the .csv file you want to read, simply pass in the file name as a string, for example:
    
        df = pd.read_csv('some_file.csv')
        
2. Pass in the entire file path if you are located in a different directory. The file path must be 100% correct in order for this to work. For example:

        df = pd.read_csv("C:\\Users\\myself\\files\\some_file.csv")

In [None]:
df = pd.read_csv('Titanic-Dataset.csv')

In [None]:
df.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


In [None]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


#### Print your current directory file path with pwd

In [None]:
pwd

'/content'

#### List the files in your current directory with ls

In [None]:
ls

[0m[01;34msample_data[0m/  Titanic-Dataset.csv



## Outputting data in pandas

Just as pandas can import data from various file types, it also allows
you to export data into various formats. This happens especially when
data is transformed using pandas and needs to be saved locally on your
machine. Below is how to output pandas DataFrames into various formats.

### Outputting a DataFrame into a CSV file

A pandas DataFrame (here we are using `df`) is saved as a CSV file using
the `.to_csv()` method. The arguments include the filename with path and
`index` – where `index=True` implies writing the DataFrame’s index.


In [None]:
# Save the DataFrame to a CSV file
df.to_csv('sample_data.csv', index=False)
df.to_csv('sample_data.txt', index=False)


### Importing CSV files

Use `read_csv()` with the path to the CSV file to read a comma-separated
values file.

In [None]:
df = pd.read_csv("sample_data.csv")


This read operation loads the CSV file `diabetes.csv` to generate a
pandas DataFrame object `df`. Throughout this tutorial, you’ll see how
to manipulate such DataFrame objects.

### Importing text files

Reading text files is similar to CSV files. The only nuance is that you
need to specify a separator with the `sep` argument, as shown below. The
separator argument refers to the symbol used to separate values in a
DataFrame. Comma (`sep=","`), whitespace (`sep="\s+"`), tab
(`sep="\t"`), and colon (`sep=":"`) are the commonly used separators.
Here `\s+` represents one or more whitespace characters.

In [None]:
df = pd.read_csv("sample_data.txt", sep=",")


### Importing Excel files (single sheet)

Reading Excel files (both XLS and XLSX) is as easy as the `read_excel()`
function, using the file path as an input.

``` python
df = pd.read_excel('diabetes.xlsx')
```

You can also specify other arguments, such as `header` to specify which
row becomes the DataFrame’s header. It has a default value of `0`, which
denotes the first row as headers or column names. You can also specify
column names as a list in the `names` argument. The `index_col` (default
is `None`) argument can be used if the file contains a row index.

> **Note**: In a pandas DataFrame or Series, the **index** is an
> identifier that points to the location of a row or column in a pandas
> DataFrame. In a nutshell, the index labels the row or column of a
> DataFrame and lets you access a specific row or column by using its
> index (you will see this later on). A DataFrame’s row index can be a
> range (e.g., 0 to 303), a time series (dates or timestamps), a unique
> identifier (e.g., `employee_ID` in an employees table), or other types
> of data. For columns, it’s usually a string (denoting the column
> name).

If you want to learn more about importing data with pandas, check out
this [cheat sheet on importing various file types with
Python](https://www.datacamp.com/community/blog/python-data-import-cheat-sheet).


## Viewing and understanding DataFrames using pandas

After reading tabular data as a DataFrame, you would need to have a
glimpse of the data. You can either view a small sample of the dataset
or a summary of the data in the form of summary statistics.

### Viewing data using `.head()` and `.tail()`

You can view the first few or last few rows of a DataFrame using the
`.head()` or `.tail()` methods, respectively. You can specify the number
of rows through the `n` argument (the default value is 5).

In [None]:
df.head()

In [None]:
df.tail(10)

### Understanding data using `.describe()`

The `.describe()` method prints the summary statistics of all numeric
columns, such as count, mean, standard deviation, minimum, maximum, and
quartiles of numeric columns.



In [None]:
df.describe()


You can also modify the quartiles using the `percentiles` argument.
Here, for example, we’re looking at the 30%, 50%, and 70% percentiles of
the numeric columns in DataFrame `df`.

In [None]:
df.describe(percentiles=[0.3, 0.5, 0.7])

Unnamed: 0,CustomerID,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary
count,100.0,50.0,100.0,50.0,100.0,100.0,100.0,50.0
mean,50.5,41.74,3.9,121110.2922,2.32,0.45,0.5,91275.4482
std,29.011492,14.00701,2.805118,79923.276713,1.188157,0.5,0.502519,49737.844459
min,1.0,18.0,0.0,3826.14,1.0,0.0,0.0,6360.94
30%,30.7,32.0,2.0,63650.757,1.0,0.0,0.0,63488.22
50%,50.5,40.5,4.0,109268.375,2.0,0.0,0.5,97853.53
70%,70.3,51.3,6.0,173288.292,3.0,1.0,1.0,117318.291
max,100.0,68.0,9.0,245460.22,4.0,1.0,1.0,197057.21



You can also isolate specific data types in your summary output by using
the `include` argument. Here, for example, we’re only summarizing the
columns with the integer data type.

Similarly, you might want to exclude certain data types using the
`exclude` argument.

Often, practitioners find it easy to view such statistics by transposing
them with the `.T` attribute.

In [None]:

df.describe(include=[int])

Unnamed: 0,CustomerID,Tenure,NumOfProducts,HasCrCard,IsActiveMember
count,100.0,100.0,100.0,100.0,100.0
mean,50.5,3.9,2.32,0.45,0.5
std,29.011492,2.805118,1.188157,0.5,0.502519
min,1.0,0.0,1.0,0.0,0.0
25%,25.75,2.0,1.0,0.0,0.0
50%,50.5,4.0,2.0,0.0,0.5
75%,75.25,6.25,3.0,1.0,1.0
max,100.0,9.0,4.0,1.0,1.0


In [None]:
df.describe(exclude=[int])

Unnamed: 0,Gender,Age,Geography,Balance,EstimatedSalary,Churn
count,100,50.0,100,50.0,50.0,100
unique,2,,3,,,2
top,Female,,Germany,,,No
freq,56,,35,,,78
mean,,41.74,,121110.2922,91275.4482,
std,,14.00701,,79923.276713,49737.844459,
min,,18.0,,3826.14,6360.94,
25%,,30.25,,51432.9675,56107.7425,
50%,,40.5,,109268.375,97853.53,
75%,,52.75,,188758.29,121309.9475,


In [None]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
CustomerID,100.0,50.5,29.011492,1.0,25.75,50.5,75.25,100.0
Age,50.0,41.74,14.00701,18.0,30.25,40.5,52.75,68.0
Tenure,100.0,3.9,2.805118,0.0,2.0,4.0,6.25,9.0
Balance,50.0,121110.2922,79923.276713,3826.14,51432.9675,109268.375,188758.29,245460.22
NumOfProducts,100.0,2.32,1.188157,1.0,1.0,2.0,3.0,4.0
HasCrCard,100.0,0.45,0.5,0.0,0.0,0.0,1.0,1.0
IsActiveMember,100.0,0.5,0.502519,0.0,0.0,0.5,1.0,1.0
EstimatedSalary,50.0,91275.4482,49737.844459,6360.94,56107.7425,97853.53,121309.9475,197057.21



### Understanding data using `.info()`

The `.info()` method is a quick way to look at the data types, missing
values, and data size of a DataFrame. Here, we’re setting the
`show_counts` argument to `True`, which gives a view over the total
non-missing values in each column. We’re also setting `memory_usage` to
`True`, which shows the total memory usage of the DataFrame elements.
When `verbose` is set to `True`, it prints the full summary from
`.info()`.

In [None]:
df.info(show_counts=True, memory_usage=True, verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   CustomerID       100 non-null    int64  
 1   Gender           100 non-null    object 
 2   Age              50 non-null     float64
 3   Geography        100 non-null    object 
 4   Tenure           100 non-null    int64  
 5   Balance          50 non-null     float64
 6   NumOfProducts    100 non-null    int64  
 7   HasCrCard        100 non-null    int64  
 8   IsActiveMember   100 non-null    int64  
 9   EstimatedSalary  50 non-null     float64
 10  Churn            100 non-null    object 
dtypes: float64(3), int64(5), object(3)
memory usage: 8.7+ KB



### Understanding your data using `.shape`

The number of rows and columns of a DataFrame can be identified using
the `.shape` attribute of the DataFrame. It returns a tuple
`(row, column)` and can be indexed to get only the number of rows or
columns.

In [None]:
df.shape       # Get the number of rows and columns
df.shape[0]    # Get the number of rows only
df.shape[1]    # Get the number of columns only

11


### Get all columns and column names

Calling the `.columns` attribute of a DataFrame object returns the
column names in the form of an Index object.


It can be converted to a list using the `list()` function.

In [None]:
#print(df.columns)
list(df.columns)


['CustomerID',
 'Gender',
 'Age',
 'Geography',
 'Tenure',
 'Balance',
 'NumOfProducts',
 'HasCrCard',
 'IsActiveMember',
 'EstimatedSalary',
 'Churn']

Run the two cells below to generate our data for further tasks.

In [None]:
# @title
%%writefile tips.csv
total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251
25.29,4.71,Male,No,Sun,Dinner,4,6.32,Erik Smith,213140353657882,Sun9679
8.77,2.0,Male,No,,Dinner,2,4.38,Kristopher Johnson,2223727524230344,Sun5985
26.88,3.12,Male,No,Sun,Dinner,4,6.72,Robert Buck,3514785077705092,Sun8157
15.04,1.96,Male,No,Sun,Dinner,2,7.52,Joseph Mcdonald,3522866365840377,Sun6820
14.78,3.23,Male,No,Sun,Dinner,2,7.39,Jerome Abbott,3532124519049786,Sun3775
10.27,1.71,Male,No,Sun,Dinner,2,5.14,William Riley,566287581219,Sun2546
35.26,5.0,Female,No,Sun,Dinner,,8.82,Diane Macias,4577817359320969,Sun6686
15.42,1.57,Male,No,Sun,Dinner,,7.71,Chad Harrington,577040572932,Sun1300
18.43,3.0,Male,No,Sun,Dinner,4,4.61,Joshua Jones,6011163105616890,Sun2971
14.83,3.02,Female,No,Sun,Dinner,2,7.42,Vanessa Jones,30016702287574,Sun3848
21.58,3.92,Male,No,Sun,Dinner,2,10.79,Matthew Reilly,180073029785069,Sun1878
10.33,1.67,Female,No,Sun,Dinner,3,3.44,Elizabeth Foster,4240025044626033,Sun9715
16.29,3.71,Male,No,Sun,Dinner,3,5.43,John Pittman,6521340257218708,Sun2998
16.97,3.5,Female,No,Sun,Dinner,3,5.66,Laura Martinez,30422275171379,Sun2789
20.65,3.35,Male,No,Sat,Dinner,3,6.88,Timothy Oneal,6568069240986485,Sat9213
17.92,4.08,Male,No,Sat,Dinner,2,8.96,Thomas Rice,4403296224639756,Sat1709
20.29,2.75,Female,No,Sat,Dinner,,10.14,Natalie Gardner,5448125351489749,Sat9618
15.77,2.23,Female,No,Sat,Dinner,2,7.88,Ashley Shelton,3524119516293213,Sat9786
39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,Sat239
19.82,3.18,Male,No,Sat,Dinner,2,9.91,Christopher Ross,36739148167928,Sat6236
17.81,2.34,Male,No,Sat,Dinner,4,4.45,Robert Perkins,30502930499388,Sat907
13.37,2.0,Male,No,Sat,Dinner,2,6.68,Kyle Avery,6531339539615499,Sat6651
12.69,2.0,Male,No,Sat,Dinner,2,6.34,Patrick Barber,30155551880343,Sat394
21.7,4.3,Male,No,Sat,Dinner,2,10.85,David Collier,5529694315416009,Sat3697
19.65,3.0,Female,No,Sat,Dinner,2,9.82,Melinda Murphy,5489272944576051,Sat2467
9.55,1.45,Male,No,Sat,Dinner,2,4.78,Grant Hall,30196517521548,Sat4099
18.35,2.5,Male,No,Sat,Dinner,4,4.59,Danny Santiago,630415546013,Sat4947
15.06,3.0,Female,No,Sat,Dinner,2,7.53,Amanda Wilson,213186304291560,Sat1327
20.69,2.45,Female,No,Sat,Dinner,4,5.17,Amber Francis,377742985258914,Sat6649
17.78,3.27,Male,No,Sat,Dinner,2,8.89,Jacob Castillo,3551492000704805,Sat8124
24.06,3.6,Male,No,Sat,,3,8.02,Joseph Mullins,5519770449260299,Sat632
16.31,2.0,Male,No,Sat,Dinner,3,5.44,William Ford,3527691170179398,Sat9139
16.93,3.07,Female,No,Sat,Dinner,3,5.64,Erin Lewis,5161695527390786,Sat6406
18.69,2.31,Male,No,Sat,Dinner,3,6.23,Brandon Bradley,4427601595688633,Sat4056
31.27,5.0,Male,No,Sat,Dinner,3,10.42,Mr. Brandon Berry,6011525851069856,Sat6373
16.04,2.24,Male,No,Sat,Dinner,3,5.35,Adam Edwards,3544447755679420,Sat8549
17.46,2.54,Male,No,Sun,Dinner,2,8.73,David Boyer,3536678244278149,Sun9460
13.94,3.06,Male,No,Sun,Dinner,2,6.97,Bryan Brown,36231182760859,Sun1699
9.68,1.32,Male,No,Sun,Dinner,,4.84,Christopher Spears,4387671121369212,Sun3279
30.4,5.6,Male,No,Sun,Dinner,4,7.6,Todd Cooper,503846761263,Sun2274
18.29,3.0,Male,No,Sun,Dinner,2,9.14,Richard Fitzgerald,375156610762053,Sun8643
22.23,5.0,Male,No,Sun,Dinner,2,11.12,Joshua Gilmore,4292072734899,Sun7097
32.4,6.0,Male,No,Sun,Dinner,4,8.1,James Barnes,3552002592874186,Sun9677
28.55,2.05,Male,No,Sun,Dinner,3,9.52,Austin Fisher,6011481668986587,Sun4142
18.04,3.0,Male,No,Sun,Dinner,2,9.02,William Roth,6573923967142503,Sun9774
12.54,2.5,Male,No,Sun,Dinner,2,6.27,Jeremiah Neal,2225400829691416,Sun2021
10.29,2.6,Female,No,Sun,Dinner,2,5.14,Jessica Ibarra,4999759463713,Sun4474
34.81,5.2,Female,No,Sun,Dinner,4,8.7,Emily Daniel,4291280793094374,Sun6165
9.94,1.56,Male,No,Sun,Dinner,2,4.97,Curtis Morgan,4628628020417301,Sun4561
25.56,4.34,Male,No,Sun,Dinner,4,6.39,Ronald Owens,6569607991983380,Sun9470
19.49,3.51,Male,No,Sun,Dinner,2,9.74,Michael Hamilton,6502227786581768,Sun1118
38.01,3.0,Male,Yes,Sat,Dinner,,9.5,James Christensen DDS,349793629453226,Sat8903
26.41,1.5,Female,No,Sat,Dinner,2,13.2,Melody Simon,4745394421258160,Sat8980
11.24,1.76,Male,Yes,Sat,Dinner,2,5.62,Troy Guerrero,3560782621035582,Sat6683
48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139
20.29,3.21,Male,Yes,Sat,Dinner,2,10.14,Anthony Mclean,347614304015027,Sat2353
13.81,2.0,Male,Yes,Sat,Dinner,2,6.9,Ryan Hernandez,4766834726806,Sat3030
11.02,1.98,Male,Yes,Sat,Dinner,2,5.51,Joseph Hart,180046232326178,Sat2265
18.29,3.76,Male,Yes,Sat,Dinner,4,4.57,Chad Hart,580171498976,Sat4178
17.59,2.64,Male,No,Sat,Dinner,3,5.86,Michael Johnson,2222114458088108,Sat1667
20.08,3.15,Male,No,Sat,Dinner,3,6.69,Justin Dixon,180021262464926,Sat6840
16.45,2.47,Female,No,Sat,Dinner,2,8.22,Rachel Vaughn,3569262692675583,Sat4750
3.07,1.0,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455
20.23,2.01,Male,No,Sat,Dinner,2,10.12,Mr. Travis Bailey Jr.,060406789937,Sat561
15.01,2.09,Male,Yes,Sat,Dinner,2,7.5,Adam Hall,4700924377057571,Sat855
12.02,1.97,Male,No,Sat,Dinner,2,6.01,Max Brown,213139760497718,Sat2100
17.07,3.0,Female,No,Sat,Dinner,3,5.69,Teresa Fisher,5442222963796367,Sat3469
26.86,3.14,Female,Yes,Sat,Dinner,2,13.43,Victoria Obrien MD,4216245673726,Sat1967
25.28,5.0,Female,Yes,Sat,Dinner,2,12.64,Julie Holmes,5418689346409571,Sat6065
14.73,2.2,Female,No,Sat,Dinner,2,7.36,Ashley Harris,501828723483,Sat6548
10.51,1.25,Male,No,Sat,Dinner,2,5.26,Kenneth Hayes,213142079731108,Sat5056
17.92,3.08,Male,Yes,Sat,Dinner,2,8.96,Mark Smith,676188485350,Sat9908
27.2,4.0,Male,No,Thur,Lunch,4,6.8,John Davis,30344778738589,Thur4924
22.76,3.0,Male,No,Thur,Lunch,2,11.38,Chris Hahn,3591887177014031,Thur2863
17.29,2.71,Male,No,Thur,Lunch,2,8.64,Brian Diaz,4759290988169738,Thur9501
19.44,3.0,Male,Yes,Thur,Lunch,2,9.72,Louis Torres,38848369968464,Thur6453
16.66,3.4,Male,No,Thur,Lunch,2,8.33,William Martin,4550549048402707,Thur8232
10.07,1.83,Female,No,Thur,Lunch,1,10.07,Julie Moody,630413282843,Thur4909
32.68,5.0,Male,Yes,Thur,Lunch,2,16.34,Daniel Murphy,5356177501009133,Thur8801
15.98,2.03,Male,No,Thur,Lunch,2,7.99,Jason Jones,4442984204923315,Thur1944
34.83,5.17,Female,No,Thur,Lunch,4,8.71,Shawna Cook,6011787464177340,Thur7972
13.03,2.0,Male,No,Thur,Lunch,2,6.52,Derek Thomas,213161022097557,Thur6793
18.28,4.0,Male,No,Thur,Lunch,2,9.14,Donald Williams,5363745772301404,Thur3636
24.71,5.85,Male,No,Thur,Lunch,2,12.36,Roger Taylor,4410248629955,Thur9003
21.16,3.0,Male,No,Thur,Lunch,2,10.58,Keith Lewis,4356005144080422,Thur6273
28.97,3.0,Male,Yes,Fri,Dinner,2,14.48,Daniel Mason,3597456900644078,Fri4175
22.49,3.5,Male,No,Fri,Dinner,2,11.24,Earl Horn,6011849326227398,Fri5700
5.75,1.0,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,Fri3780
16.32,4.3,Female,Yes,Fri,Dinner,2,8.16,Natalie Nguyen,5181236182893396,Fri6963
22.75,3.25,Female,No,Fri,Dinner,2,11.38,Jamie Garza,676318332068,Fri2318
40.17,4.73,Male,Yes,Fri,Dinner,4,10.04,Aaron Bentley,180026611638690,Fri9628
27.28,4.0,Male,Yes,Fri,Dinner,2,13.64,Eric Carter,4563054452787961,Fri3159
12.03,1.5,Male,Yes,Fri,Dinner,2,6.02,Eric Herrera,580116092652,Fri9268
21.01,3.0,Male,Yes,Fri,Dinner,2,10.5,Michael Li,4831801127457917,Fri144
12.46,1.5,Male,No,Fri,Dinner,2,6.23,Edward Carter,347435564751626,Fri5575
11.35,2.5,Female,Yes,Fri,Dinner,2,5.68,Lori Lynch,38558279384492,Fri4106
15.38,3.0,Female,Yes,Fri,Dinner,2,7.69,Tiffany Colon,6011012799432041,Fri8382
44.3,2.5,Female,Yes,Sat,Dinner,3,14.77,Heather Cohen,379771118886604,Sat6240
22.42,3.48,Female,Yes,Sat,Dinner,2,11.21,Kathleen Hawkins,348009865484721,Sat1015
20.92,4.08,Female,No,Sat,Dinner,2,10.46,Gabrielle Frederick,4013010878990106,Sat3194
15.36,1.64,Male,Yes,Sat,Dinner,2,7.68,David Price,4029957452720,Sat5106
20.49,4.06,Male,Yes,Sat,Dinner,2,10.24,Karl Mcdaniel,180024452771522,Sat7865
25.21,4.29,Male,Yes,Sat,Dinner,2,12.6,Jason Mullen,4738781782868,Sat5196
18.24,3.76,Male,No,Sat,Dinner,2,9.12,Steven Grant,4112810433473856,Sat6376
14.31,4.0,Female,Yes,Sat,Dinner,2,7.16,Amanda Anderson,375638820334211,Sat2614
14.0,3.0,Male,No,Sat,Dinner,2,7.0,James Sanchez,345243048851323,Sat6801
7.25,1.0,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801
38.07,4.0,Male,No,Sun,Dinner,3,12.69,Jeff Lopez,3572865915176463,Sun591
23.95,2.55,Male,No,Sun,Dinner,2,11.98,John Joyce,6565964211060570,Sun1885
25.71,4.0,Female,No,Sun,Dinner,3,8.57,Katie Smith,5400160161311292,Sun6492
17.31,3.5,Female,No,Sun,Dinner,2,8.65,Kayla Stone,379494319310858,Sun8746
29.93,5.07,Male,No,Sun,Dinner,4,7.48,Shawn Blake,4689079711213722,Sun22
10.65,1.5,Female,No,Thur,Lunch,2,5.32,Linda Zhang,3560509622598239,Thur9593
12.43,1.8,Female,No,Thur,Lunch,2,6.22,Dr. Caroline Tucker,502047186908,Thur8084
24.08,2.92,Female,No,Thur,Lunch,4,6.02,Melanie Jordan,676212062720,Thur8063
11.69,2.31,Male,No,Thur,Lunch,2,5.84,Kenneth Goodman,4891259691010,Thur8289
13.42,1.68,Female,No,Thur,Lunch,2,6.71,Laura Garcia,5181484390945653,Thur2158
14.26,2.5,Male,No,Thur,Lunch,2,7.13,Perry Garcia,180034646320219,Thur3579
15.95,2.0,Male,No,Thur,Lunch,2,7.98,Christopher Lang,4820629318698319,Thur1992
12.48,2.52,Female,No,Thur,Lunch,2,6.24,Jordan Diaz,4472778228206399,Thur208
29.8,4.2,Female,No,Thur,Lunch,6,4.97,Angela Sanchez,503857080488,Thur3948
8.52,1.48,Male,No,Thur,Lunch,2,4.26,Mario Bradshaw,4524404353861811,Thur6719
14.52,2.0,Female,No,Thur,Lunch,2,7.26,Jessica Simmons,213102478792200,Thur1512
11.38,2.0,Female,No,Thur,Lunch,2,5.69,Christine Perkins,3548391118913991,Thur8551
22.82,2.18,Male,No,Thur,Lunch,3,7.61,Raymond Torres,4855776744024,Thur9424
19.08,1.5,Male,No,Thur,Lunch,2,9.54,Seth Sexton,213113680829581,Thur1446
20.27,2.83,Female,No,Thur,Lunch,2,10.14,Ashley Burke,5394652998970066,Thur9005
11.17,1.5,Female,No,Thur,Lunch,2,5.58,Taylor Gonzalez,6011990685390011,Thur7783
12.26,2.0,Female,No,Thur,Lunch,2,6.13,Kaitlin Wolf,676348318145,Thur1561
18.26,3.25,Female,No,Thur,Lunch,2,9.13,Karen Rodriguez,4952604748911,Thur75
8.51,1.25,Female,No,Thur,Lunch,2,4.26,Rebecca Harris,4320272020376174,Thur6600
10.33,2.0,Female,No,Thur,Lunch,2,5.16,Donna Kelly,180048553626376,Thur1393
14.15,2.0,Female,No,Thur,Lunch,2,7.08,Vanessa Morris,213189344156819,Thur3890
16.0,2.0,Male,Yes,Thur,Lunch,2,8.0,Jason Burgess,3561461821942363,Thur2710
13.16,2.75,Female,No,Thur,Lunch,2,6.58,Lindsey Meyer,676239597203,Thur6245
17.47,3.5,Female,No,Thur,Lunch,2,8.74,Kayla Rios,5233918213804470,Thur3906
34.3,6.7,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,Thur1025
41.19,5.0,Male,No,Thur,Lunch,5,8.24,Eric Andrews,4356531761046453,Thur3621
27.05,5.0,Female,No,Thur,Lunch,6,4.51,Regina Jones,4311048695487,Thur6179
16.43,2.3,Female,No,Thur,Lunch,2,8.22,Linda Jones,6542729219645658,Thur9002
8.35,1.5,Female,No,Thur,Lunch,2,4.18,Amy Young,4285454264477,Thur9331
18.64,1.36,Female,No,Thur,Lunch,3,6.21,Kelly Estrada,060463302327,Thur3941
11.87,1.63,Female,No,Thur,Lunch,2,5.94,Annette Cunningham,675937746864,Thur4780
9.78,1.73,Male,No,Thur,Lunch,2,4.89,David Stewart,3578014604116399,Thur7276
7.51,2.0,Male,No,Thur,Lunch,2,3.76,Daniel Robbins,4823139288341889,Thur6321
14.07,2.5,Male,No,Sun,Dinner,2,7.04,Luke Rice,4813617017359506,Sun8863
13.13,2.0,Male,No,Sun,Dinner,2,6.56,Jason Arnold,3571825125296106,Sun2127
17.26,2.74,Male,No,Sun,Dinner,3,5.75,Gregory Smith,4292362333741,Sun5205
24.55,2.0,Male,No,Sun,Dinner,4,6.14,Todd Patterson,4416804908942159,Sun8670
19.77,2.0,Male,No,Sun,Dinner,4,4.94,James Smith,213169731428229,Sun5814
29.85,5.14,Female,No,Sun,Dinner,5,5.97,Madison Wilson,4210875236164664,Sun9176
48.17,5.0,Male,No,Sun,Dinner,6,8.03,Ryan Gonzales,3523151482063321,Sun7518
25.0,3.75,Female,No,Sun,Dinner,4,6.25,Laura Robles,213158685144262,Sun7015
13.39,2.61,Female,No,Sun,Dinner,2,6.7,Ashley Boyd,3571088058115021,Sun982
16.49,2.0,Male,No,Sun,Dinner,4,4.12,Christopher Soto,30501814271434,Sun1781
21.5,3.5,Male,No,Sun,Dinner,4,5.38,Travis Gonzalez,3527668419764685,Sun245
12.66,2.5,Male,No,Sun,Dinner,2,6.33,Brandon Oconnor,4406882156920533,Sun5879
16.21,2.0,Female,No,Sun,Dinner,3,5.4,Jennifer Baird,4227834176859693,Sun5521
13.81,2.0,Male,No,Sun,Dinner,2,6.9,Charles Newton,5552793481414044,Sun8594
17.51,3.0,Female,Yes,Sun,Dinner,2,8.76,Audrey Griffin,3500853929693258,Sun444
24.52,3.48,Male,No,Sun,Dinner,3,8.17,Jacob Hansen,4031116007387,Sun9043
20.76,2.24,Male,No,Sun,Dinner,2,10.38,Gordon Lane,4110599849536479,Sun6738
31.71,4.5,Male,No,Sun,Dinner,4,7.93,Michael Lawson,3566285921227119,Sun3719
10.59,1.61,Female,Yes,Sat,Dinner,2,5.3,Sara Jimenez,502053147208,Sat9795
10.63,2.0,Female,Yes,Sat,Dinner,2,5.32,Amy Hill,3536332481454019,Sat1788
50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954
15.81,3.16,Male,Yes,Sat,Dinner,2,7.9,David Hall,502004138207,Sat6750
7.25,5.15,Male,Yes,Sun,Dinner,2,3.62,Larry White,30432617123103,Sun9209
31.85,3.18,Male,Yes,Sun,Dinner,2,15.92,Scott Perez,3577115550328507,Sun9335
16.82,4.0,Male,Yes,Sun,Dinner,2,8.41,Brian Miles,3586342145399277,Sun7621
32.9,3.11,Male,Yes,Sun,Dinner,2,16.45,Nathan Reynolds,370307040837149,Sun5109
17.89,2.0,Male,Yes,Sun,Dinner,2,8.94,Walter Simmons,6011481578696110,Sun5961
14.48,2.0,Male,Yes,Sun,Dinner,2,7.24,John Dudley,4565183162071073,Sun6203
9.6,4.0,Female,Yes,Sun,Dinner,2,4.8,Melanie Gray,4211808859168,Sun4598
34.63,3.55,Male,Yes,Sun,Dinner,2,17.32,Brian Bailey,346656312114848,Sun9851
34.65,3.68,Male,Yes,Sun,Dinner,4,8.66,James Hebert DDS,676168737648,Sun7544
23.33,5.65,Male,Yes,Sun,Dinner,2,11.66,Jason Cox,6556931703586223,Sun3402
45.35,3.5,Male,Yes,Sun,Dinner,3,15.12,Jose Parsons,4112207559459910,Sun2337
23.17,6.5,Male,Yes,Sun,Dinner,4,5.79,Dr. Michael James,4718501859162,Sun6059
40.55,3.0,Male,Yes,Sun,Dinner,2,20.27,Stephen Cox,3547798222044029,Sun5140
20.69,5.0,Male,No,Sun,Dinner,5,4.14,Joseph Howell,30362407455623,Sun5842
20.9,3.5,Female,Yes,Sun,Dinner,3,6.97,Heidi Atkinson,4422858423131187,Sun4254
30.46,2.0,Male,Yes,Sun,Dinner,5,6.09,David Barrett,4792882899700988,Sun9987
18.15,3.5,Female,Yes,Sun,Dinner,3,6.05,Glenda Wiggins,578329325307,Sun430
23.1,4.0,Male,Yes,Sun,Dinner,3,7.7,Richard Stevens,3560193117506187,Sun1821
15.69,1.5,Male,Yes,Sun,Dinner,2,7.84,Riley Barnes,180053549128800,Sun5104
19.81,4.19,Female,Yes,Thur,Lunch,2,9.9,Kristy Boyd,4317015327600068,Thur967
28.44,2.56,Male,Yes,Thur,Lunch,2,14.22,Dr. Jeffrey Rich,4737538358295,Thur4334
15.48,2.02,Male,Yes,Thur,Lunch,2,7.74,Raymond Sullivan,180068856139315,Thur606
16.58,4.0,Male,Yes,Thur,Lunch,2,8.29,Benjamin Weber,676210011505,Thur9318
7.56,1.44,Male,No,Thur,Lunch,2,3.78,Michael White,4865390263095532,Thur697
10.34,2.0,Male,Yes,Thur,Lunch,2,5.17,Eric Martin,30442491190342,Thur9862
43.11,5.0,Female,Yes,Thur,Lunch,4,10.78,Brooke Soto,5544902205760175,Thur9313
13.0,2.0,Female,Yes,Thur,Lunch,2,6.5,Katherine Bond,4926725945192,Thur437
13.51,2.0,Male,Yes,Thur,Lunch,2,6.76,Joseph Murphy MD,6547218923471275,Thur2428
18.71,4.0,Male,Yes,Thur,Lunch,3,6.24,Jason Conrad,4581233003487,Thur6048
12.74,2.01,Female,Yes,Thur,Lunch,2,6.37,Abigail Parks,3586645396220590,Thur2544
13.0,2.0,Female,Yes,Thur,Lunch,2,6.5,Ashley Shaw,180088043008041,Thur1301
16.4,2.5,Female,Yes,Thur,Lunch,2,8.2,Toni Brooks,3582289985920239,Thur7770
20.53,4.0,Male,Yes,Thur,Lunch,4,5.13,Scott Kim,3570611756827620,Thur2160
16.47,3.23,Female,Yes,Thur,Lunch,3,5.49,Carly Reyes,4787787236486,Thur8084
26.59,3.41,Male,Yes,Sat,Dinner,3,8.86,Daniel Owens,38971087967574,Sat1
38.73,3.0,Male,Yes,Sat,Dinner,4,9.68,Ricky Ramirez,347817964484033,Sat4505
24.27,2.03,Male,Yes,Sat,Dinner,2,12.14,Jason Carter,4268942915626180,Sat6048
12.76,2.23,Female,Yes,Sat,Dinner,2,6.38,Sarah Cunningham,341876516331163,Sat1274
30.06,2.0,Male,Yes,Sat,Dinner,3,10.02,Shawn Mendoza,30184049218122,Sat8361
25.89,5.16,Male,Yes,Sat,Dinner,4,6.47,Christopher Li,6011962464150569,Sat6735
48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590
13.27,2.5,Female,Yes,Sat,Dinner,2,6.64,Robin Andersen,580140531089,Sat1374
28.17,6.5,Female,Yes,Sat,Dinner,3,9.39,Marissa Jackson,4922302538691962,Sat3374
12.9,1.1,Female,Yes,Sat,Dinner,2,6.45,Jessica Owen,4726904879471,Sat6983
28.15,3.0,Male,Yes,Sat,Dinner,5,5.63,Shawn Barnett PhD,4590982568244,Sat7320
11.59,1.5,Male,Yes,Sat,Dinner,2,5.8,Gary Orr,30324521283406,Sat8489
7.74,1.44,Male,Yes,Sat,Dinner,2,3.87,Nicholas Archer,340517153733524,Sat4772
30.14,3.09,Female,Yes,Sat,Dinner,4,7.54,Shelby House,502097403252,Sat8863
12.16,2.2,Male,Yes,Fri,Lunch,2,6.08,Ricky Johnson,213109508670736,Fri4607
13.42,3.48,Female,Yes,Fri,Lunch,2,6.71,Leslie Kaufman,379437981958785,Fri7511
8.58,1.92,Male,Yes,Fri,Lunch,1,8.58,Jason Lawrence,3505302934650403,Fri6624
15.98,3.0,Female,No,Fri,Lunch,3,5.33,Mary Rivera,5343428579353069,Fri6014
13.42,1.58,Male,Yes,Fri,Lunch,2,6.71,Ronald Vaughn DVM,341503466406403,Fri5959
16.27,2.5,Female,Yes,Fri,Lunch,2,8.14,Whitney Arnold,3579111947217428,Fri6665
10.09,2.0,Female,Yes,Fri,Lunch,2,5.04,Ruth Weiss,5268689490381635,Fri6359
20.45,3.0,Male,No,Sat,Dinner,4,5.11,Robert Bradley,213141668145910,Sat4319
13.28,2.72,Male,No,Sat,Dinner,2,6.64,Glenn Jones,502061651712,Sat2937
22.12,2.88,Female,Yes,Sat,Dinner,2,11.06,Jennifer Russell,4793003293608,Sat3943
24.01,2.0,Male,Yes,Sat,Dinner,4,6.0,Michael Osborne,4258682154026,Sat7872
15.69,3.0,Male,Yes,Sat,Dinner,3,5.23,Jason Parks,4812333796161,Sat6334
11.61,3.39,Male,No,Sat,Dinner,2,5.8,James Taylor,6011482917327995,Sat2124
10.77,1.47,Male,No,Sat,Dinner,2,5.38,Paul Novak,6011698897610858,Sat1467
15.53,3.0,Male,Yes,Sat,Dinner,2,7.76,Tracy Douglas,4097938155941930,Sat7220
10.07,1.25,Male,No,Sat,Dinner,2,5.04,Sean Gonzalez,3534021246117605,Sat4615
12.6,1.0,Male,Yes,Sat,Dinner,2,6.3,Matthew Myers,3543676378973965,Sat5032
32.83,1.17,Male,Yes,Sat,Dinner,2,16.42,Thomas Brown,4284722681265508,Sat2929
35.83,4.67,Female,No,Sat,Dinner,3,11.94,Kimberly Crane,676184013727,Sat9777
29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657
27.18,2.0,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766
22.67,2.0,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880
17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,Sat17
18.78,3.0,Female,No,Thur,Dinner,2,9.39,Michelle Hardin,3511451626698139,Thur672


In [None]:
# @title
%%writefile mpg.csv
mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name
18,8,307,130,3504,12,70,1,chevrolet chevelle malibu
15,8,350,165,3693,11.5,70,1,buick skylark 320
18,8,318,150,3436,11,70,1,plymouth satellite
16,8,304,150,3433,12,70,1,amc rebel sst
17,8,302,140,3449,10.5,70,1,ford torino
15,8,429,198,4341,10,70,1,ford galaxie 500
14,8,454,220,4354,9,70,1,chevrolet impala
14,8,440,215,4312,8.5,70,1,plymouth fury iii
14,8,455,225,4425,10,70,1,pontiac catalina
15,8,390,190,3850,8.5,70,1,amc ambassador dpl
15,8,383,170,3563,10,70,1,dodge challenger se
14,8,340,160,3609,8,70,1,plymouth 'cuda 340
15,8,400,150,3761,9.5,70,1,chevrolet monte carlo
14,8,455,225,3086,10,70,1,buick estate wagon (sw)
24,4,113,95,2372,15,70,3,toyota corona mark ii
22,6,198,95,2833,15.5,70,1,plymouth duster
18,6,199,97,2774,15.5,70,1,amc hornet
21,6,200,85,2587,16,70,1,ford maverick
27,4,97,88,2130,14.5,70,3,datsun pl510
26,4,97,46,1835,20.5,70,2,volkswagen 1131 deluxe sedan
25,4,110,87,2672,17.5,70,2,peugeot 504
24,4,107,90,2430,14.5,70,2,audi 100 ls
25,4,104,95,2375,17.5,70,2,saab 99e
26,4,121,113,2234,12.5,70,2,bmw 2002
21,6,199,90,2648,15,70,1,amc gremlin
10,8,360,215,4615,14,70,1,ford f250
10,8,307,200,4376,15,70,1,chevy c20
11,8,318,210,4382,13.5,70,1,dodge d200
9,8,304,193,4732,18.5,70,1,hi 1200d
27,4,97,88,2130,14.5,71,3,datsun pl510
28,4,140,90,2264,15.5,71,1,chevrolet vega 2300
25,4,113,95,2228,14,71,3,toyota corona
25,4,98,?,2046,19,71,1,ford pinto
19,6,232,100,2634,13,71,1,amc gremlin
16,6,225,105,3439,15.5,71,1,plymouth satellite custom
17,6,250,100,3329,15.5,71,1,chevrolet chevelle malibu
19,6,250,88,3302,15.5,71,1,ford torino 500
18,6,232,100,3288,15.5,71,1,amc matador
14,8,350,165,4209,12,71,1,chevrolet impala
14,8,400,175,4464,11.5,71,1,pontiac catalina brougham
14,8,351,153,4154,13.5,71,1,ford galaxie 500
14,8,318,150,4096,13,71,1,plymouth fury iii
12,8,383,180,4955,11.5,71,1,dodge monaco (sw)
13,8,400,170,4746,12,71,1,ford country squire (sw)
13,8,400,175,5140,12,71,1,pontiac safari (sw)
18,6,258,110,2962,13.5,71,1,amc hornet sportabout (sw)
22,4,140,72,2408,19,71,1,chevrolet vega (sw)
19,6,250,100,3282,15,71,1,pontiac firebird
18,6,250,88,3139,14.5,71,1,ford mustang
23,4,122,86,2220,14,71,1,mercury capri 2000
28,4,116,90,2123,14,71,2,opel 1900
30,4,79,70,2074,19.5,71,2,peugeot 304
30,4,88,76,2065,14.5,71,2,fiat 124b
31,4,71,65,1773,19,71,3,toyota corolla 1200
35,4,72,69,1613,18,71,3,datsun 1200
27,4,97,60,1834,19,71,2,volkswagen model 111
26,4,91,70,1955,20.5,71,1,plymouth cricket
24,4,113,95,2278,15.5,72,3,toyota corona hardtop
25,4,97.5,80,2126,17,72,1,dodge colt hardtop
23,4,97,54,2254,23.5,72,2,volkswagen type 3
20,4,140,90,2408,19.5,72,1,chevrolet vega
21,4,122,86,2226,16.5,72,1,ford pinto runabout
13,8,350,165,4274,12,72,1,chevrolet impala
14,8,400,175,4385,12,72,1,pontiac catalina
15,8,318,150,4135,13.5,72,1,plymouth fury iii
14,8,351,153,4129,13,72,1,ford galaxie 500
17,8,304,150,3672,11.5,72,1,amc ambassador sst
11,8,429,208,4633,11,72,1,mercury marquis
13,8,350,155,4502,13.5,72,1,buick lesabre custom
12,8,350,160,4456,13.5,72,1,oldsmobile delta 88 royale
13,8,400,190,4422,12.5,72,1,chrysler newport royal
19,3,70,97,2330,13.5,72,3,mazda rx2 coupe
15,8,304,150,3892,12.5,72,1,amc matador (sw)
13,8,307,130,4098,14,72,1,chevrolet chevelle concours (sw)
13,8,302,140,4294,16,72,1,ford gran torino (sw)
14,8,318,150,4077,14,72,1,plymouth satellite custom (sw)
18,4,121,112,2933,14.5,72,2,volvo 145e (sw)
22,4,121,76,2511,18,72,2,volkswagen 411 (sw)
21,4,120,87,2979,19.5,72,2,peugeot 504 (sw)
26,4,96,69,2189,18,72,2,renault 12 (sw)
22,4,122,86,2395,16,72,1,ford pinto (sw)
28,4,97,92,2288,17,72,3,datsun 510 (sw)
23,4,120,97,2506,14.5,72,3,toyouta corona mark ii (sw)
28,4,98,80,2164,15,72,1,dodge colt (sw)
27,4,97,88,2100,16.5,72,3,toyota corolla 1600 (sw)
13,8,350,175,4100,13,73,1,buick century 350
14,8,304,150,3672,11.5,73,1,amc matador
13,8,350,145,3988,13,73,1,chevrolet malibu
14,8,302,137,4042,14.5,73,1,ford gran torino
15,8,318,150,3777,12.5,73,1,dodge coronet custom
12,8,429,198,4952,11.5,73,1,mercury marquis brougham
13,8,400,150,4464,12,73,1,chevrolet caprice classic
13,8,351,158,4363,13,73,1,ford ltd
14,8,318,150,4237,14.5,73,1,plymouth fury gran sedan
13,8,440,215,4735,11,73,1,chrysler new yorker brougham
12,8,455,225,4951,11,73,1,buick electra 225 custom
13,8,360,175,3821,11,73,1,amc ambassador brougham
18,6,225,105,3121,16.5,73,1,plymouth valiant
16,6,250,100,3278,18,73,1,chevrolet nova custom
18,6,232,100,2945,16,73,1,amc hornet
18,6,250,88,3021,16.5,73,1,ford maverick
23,6,198,95,2904,16,73,1,plymouth duster
26,4,97,46,1950,21,73,2,volkswagen super beetle
11,8,400,150,4997,14,73,1,chevrolet impala
12,8,400,167,4906,12.5,73,1,ford country
13,8,360,170,4654,13,73,1,plymouth custom suburb
12,8,350,180,4499,12.5,73,1,oldsmobile vista cruiser
18,6,232,100,2789,15,73,1,amc gremlin
20,4,97,88,2279,19,73,3,toyota carina
21,4,140,72,2401,19.5,73,1,chevrolet vega
22,4,108,94,2379,16.5,73,3,datsun 610
18,3,70,90,2124,13.5,73,3,maxda rx3
19,4,122,85,2310,18.5,73,1,ford pinto
21,6,155,107,2472,14,73,1,mercury capri v6
26,4,98,90,2265,15.5,73,2,fiat 124 sport coupe
15,8,350,145,4082,13,73,1,chevrolet monte carlo s
16,8,400,230,4278,9.5,73,1,pontiac grand prix
29,4,68,49,1867,19.5,73,2,fiat 128
24,4,116,75,2158,15.5,73,2,opel manta
20,4,114,91,2582,14,73,2,audi 100ls
19,4,121,112,2868,15.5,73,2,volvo 144ea
15,8,318,150,3399,11,73,1,dodge dart custom
24,4,121,110,2660,14,73,2,saab 99le
20,6,156,122,2807,13.5,73,3,toyota mark ii
11,8,350,180,3664,11,73,1,oldsmobile omega
20,6,198,95,3102,16.5,74,1,plymouth duster
21,6,200,?,2875,17,74,1,ford maverick
19,6,232,100,2901,16,74,1,amc hornet
15,6,250,100,3336,17,74,1,chevrolet nova
31,4,79,67,1950,19,74,3,datsun b210
26,4,122,80,2451,16.5,74,1,ford pinto
32,4,71,65,1836,21,74,3,toyota corolla 1200
25,4,140,75,2542,17,74,1,chevrolet vega
16,6,250,100,3781,17,74,1,chevrolet chevelle malibu classic
16,6,258,110,3632,18,74,1,amc matador
18,6,225,105,3613,16.5,74,1,plymouth satellite sebring
16,8,302,140,4141,14,74,1,ford gran torino
13,8,350,150,4699,14.5,74,1,buick century luxus (sw)
14,8,318,150,4457,13.5,74,1,dodge coronet custom (sw)
14,8,302,140,4638,16,74,1,ford gran torino (sw)
14,8,304,150,4257,15.5,74,1,amc matador (sw)
29,4,98,83,2219,16.5,74,2,audi fox
26,4,79,67,1963,15.5,74,2,volkswagen dasher
26,4,97,78,2300,14.5,74,2,opel manta
31,4,76,52,1649,16.5,74,3,toyota corona
32,4,83,61,2003,19,74,3,datsun 710
28,4,90,75,2125,14.5,74,1,dodge colt
24,4,90,75,2108,15.5,74,2,fiat 128
26,4,116,75,2246,14,74,2,fiat 124 tc
24,4,120,97,2489,15,74,3,honda civic
26,4,108,93,2391,15.5,74,3,subaru
31,4,79,67,2000,16,74,2,fiat x1.9
19,6,225,95,3264,16,75,1,plymouth valiant custom
18,6,250,105,3459,16,75,1,chevrolet nova
15,6,250,72,3432,21,75,1,mercury monarch
15,6,250,72,3158,19.5,75,1,ford maverick
16,8,400,170,4668,11.5,75,1,pontiac catalina
15,8,350,145,4440,14,75,1,chevrolet bel air
16,8,318,150,4498,14.5,75,1,plymouth grand fury
14,8,351,148,4657,13.5,75,1,ford ltd
17,6,231,110,3907,21,75,1,buick century
16,6,250,105,3897,18.5,75,1,chevroelt chevelle malibu
15,6,258,110,3730,19,75,1,amc matador
18,6,225,95,3785,19,75,1,plymouth fury
21,6,231,110,3039,15,75,1,buick skyhawk
20,8,262,110,3221,13.5,75,1,chevrolet monza 2+2
13,8,302,129,3169,12,75,1,ford mustang ii
29,4,97,75,2171,16,75,3,toyota corolla
23,4,140,83,2639,17,75,1,ford pinto
20,6,232,100,2914,16,75,1,amc gremlin
23,4,140,78,2592,18.5,75,1,pontiac astro
24,4,134,96,2702,13.5,75,3,toyota corona
25,4,90,71,2223,16.5,75,2,volkswagen dasher
24,4,119,97,2545,17,75,3,datsun 710
18,6,171,97,2984,14.5,75,1,ford pinto
29,4,90,70,1937,14,75,2,volkswagen rabbit
19,6,232,90,3211,17,75,1,amc pacer
23,4,115,95,2694,15,75,2,audi 100ls
23,4,120,88,2957,17,75,2,peugeot 504
22,4,121,98,2945,14.5,75,2,volvo 244dl
25,4,121,115,2671,13.5,75,2,saab 99le
33,4,91,53,1795,17.5,75,3,honda civic cvcc
28,4,107,86,2464,15.5,76,2,fiat 131
25,4,116,81,2220,16.9,76,2,opel 1900
25,4,140,92,2572,14.9,76,1,capri ii
26,4,98,79,2255,17.7,76,1,dodge colt
27,4,101,83,2202,15.3,76,2,renault 12tl
17.5,8,305,140,4215,13,76,1,chevrolet chevelle malibu classic
16,8,318,150,4190,13,76,1,dodge coronet brougham
15.5,8,304,120,3962,13.9,76,1,amc matador
14.5,8,351,152,4215,12.8,76,1,ford gran torino
22,6,225,100,3233,15.4,76,1,plymouth valiant
22,6,250,105,3353,14.5,76,1,chevrolet nova
24,6,200,81,3012,17.6,76,1,ford maverick
22.5,6,232,90,3085,17.6,76,1,amc hornet
29,4,85,52,2035,22.2,76,1,chevrolet chevette
24.5,4,98,60,2164,22.1,76,1,chevrolet woody
29,4,90,70,1937,14.2,76,2,vw rabbit
33,4,91,53,1795,17.4,76,3,honda civic
20,6,225,100,3651,17.7,76,1,dodge aspen se
18,6,250,78,3574,21,76,1,ford granada ghia
18.5,6,250,110,3645,16.2,76,1,pontiac ventura sj
17.5,6,258,95,3193,17.8,76,1,amc pacer d/l
29.5,4,97,71,1825,12.2,76,2,volkswagen rabbit
32,4,85,70,1990,17,76,3,datsun b-210
28,4,97,75,2155,16.4,76,3,toyota corolla
26.5,4,140,72,2565,13.6,76,1,ford pinto
20,4,130,102,3150,15.7,76,2,volvo 245
13,8,318,150,3940,13.2,76,1,plymouth volare premier v8
19,4,120,88,3270,21.9,76,2,peugeot 504
19,6,156,108,2930,15.5,76,3,toyota mark ii
16.5,6,168,120,3820,16.7,76,2,mercedes-benz 280s
16.5,8,350,180,4380,12.1,76,1,cadillac seville
13,8,350,145,4055,12,76,1,chevy c10
13,8,302,130,3870,15,76,1,ford f108
13,8,318,150,3755,14,76,1,dodge d100
31.5,4,98,68,2045,18.5,77,3,honda accord cvcc
30,4,111,80,2155,14.8,77,1,buick opel isuzu deluxe
36,4,79,58,1825,18.6,77,2,renault 5 gtl
25.5,4,122,96,2300,15.5,77,1,plymouth arrow gs
33.5,4,85,70,1945,16.8,77,3,datsun f-10 hatchback
17.5,8,305,145,3880,12.5,77,1,chevrolet caprice classic
17,8,260,110,4060,19,77,1,oldsmobile cutlass supreme
15.5,8,318,145,4140,13.7,77,1,dodge monaco brougham
15,8,302,130,4295,14.9,77,1,mercury cougar brougham
17.5,6,250,110,3520,16.4,77,1,chevrolet concours
20.5,6,231,105,3425,16.9,77,1,buick skylark
19,6,225,100,3630,17.7,77,1,plymouth volare custom
18.5,6,250,98,3525,19,77,1,ford granada
16,8,400,180,4220,11.1,77,1,pontiac grand prix lj
15.5,8,350,170,4165,11.4,77,1,chevrolet monte carlo landau
15.5,8,400,190,4325,12.2,77,1,chrysler cordoba
16,8,351,149,4335,14.5,77,1,ford thunderbird
29,4,97,78,1940,14.5,77,2,volkswagen rabbit custom
24.5,4,151,88,2740,16,77,1,pontiac sunbird coupe
26,4,97,75,2265,18.2,77,3,toyota corolla liftback
25.5,4,140,89,2755,15.8,77,1,ford mustang ii 2+2
30.5,4,98,63,2051,17,77,1,chevrolet chevette
33.5,4,98,83,2075,15.9,77,1,dodge colt m/m
30,4,97,67,1985,16.4,77,3,subaru dl
30.5,4,97,78,2190,14.1,77,2,volkswagen dasher
22,6,146,97,2815,14.5,77,3,datsun 810
21.5,4,121,110,2600,12.8,77,2,bmw 320i
21.5,3,80,110,2720,13.5,77,3,mazda rx-4
43.1,4,90,48,1985,21.5,78,2,volkswagen rabbit custom diesel
36.1,4,98,66,1800,14.4,78,1,ford fiesta
32.8,4,78,52,1985,19.4,78,3,mazda glc deluxe
39.4,4,85,70,2070,18.6,78,3,datsun b210 gx
36.1,4,91,60,1800,16.4,78,3,honda civic cvcc
19.9,8,260,110,3365,15.5,78,1,oldsmobile cutlass salon brougham
19.4,8,318,140,3735,13.2,78,1,dodge diplomat
20.2,8,302,139,3570,12.8,78,1,mercury monarch ghia
19.2,6,231,105,3535,19.2,78,1,pontiac phoenix lj
20.5,6,200,95,3155,18.2,78,1,chevrolet malibu
20.2,6,200,85,2965,15.8,78,1,ford fairmont (auto)
25.1,4,140,88,2720,15.4,78,1,ford fairmont (man)
20.5,6,225,100,3430,17.2,78,1,plymouth volare
19.4,6,232,90,3210,17.2,78,1,amc concord
20.6,6,231,105,3380,15.8,78,1,buick century special
20.8,6,200,85,3070,16.7,78,1,mercury zephyr
18.6,6,225,110,3620,18.7,78,1,dodge aspen
18.1,6,258,120,3410,15.1,78,1,amc concord d/l
19.2,8,305,145,3425,13.2,78,1,chevrolet monte carlo landau
17.7,6,231,165,3445,13.4,78,1,buick regal sport coupe (turbo)
18.1,8,302,139,3205,11.2,78,1,ford futura
17.5,8,318,140,4080,13.7,78,1,dodge magnum xe
30,4,98,68,2155,16.5,78,1,chevrolet chevette
27.5,4,134,95,2560,14.2,78,3,toyota corona
27.2,4,119,97,2300,14.7,78,3,datsun 510
30.9,4,105,75,2230,14.5,78,1,dodge omni
21.1,4,134,95,2515,14.8,78,3,toyota celica gt liftback
23.2,4,156,105,2745,16.7,78,1,plymouth sapporo
23.8,4,151,85,2855,17.6,78,1,oldsmobile starfire sx
23.9,4,119,97,2405,14.9,78,3,datsun 200-sx
20.3,5,131,103,2830,15.9,78,2,audi 5000
17,6,163,125,3140,13.6,78,2,volvo 264gl
21.6,4,121,115,2795,15.7,78,2,saab 99gle
16.2,6,163,133,3410,15.8,78,2,peugeot 604sl
31.5,4,89,71,1990,14.9,78,2,volkswagen scirocco
29.5,4,98,68,2135,16.6,78,3,honda accord lx
21.5,6,231,115,3245,15.4,79,1,pontiac lemans v6
19.8,6,200,85,2990,18.2,79,1,mercury zephyr 6
22.3,4,140,88,2890,17.3,79,1,ford fairmont 4
20.2,6,232,90,3265,18.2,79,1,amc concord dl 6
20.6,6,225,110,3360,16.6,79,1,dodge aspen 6
17,8,305,130,3840,15.4,79,1,chevrolet caprice classic
17.6,8,302,129,3725,13.4,79,1,ford ltd landau
16.5,8,351,138,3955,13.2,79,1,mercury grand marquis
18.2,8,318,135,3830,15.2,79,1,dodge st. regis
16.9,8,350,155,4360,14.9,79,1,buick estate wagon (sw)
15.5,8,351,142,4054,14.3,79,1,ford country squire (sw)
19.2,8,267,125,3605,15,79,1,chevrolet malibu classic (sw)
18.5,8,360,150,3940,13,79,1,chrysler lebaron town @ country (sw)
31.9,4,89,71,1925,14,79,2,vw rabbit custom
34.1,4,86,65,1975,15.2,79,3,maxda glc deluxe
35.7,4,98,80,1915,14.4,79,1,dodge colt hatchback custom
27.4,4,121,80,2670,15,79,1,amc spirit dl
25.4,5,183,77,3530,20.1,79,2,mercedes benz 300d
23,8,350,125,3900,17.4,79,1,cadillac eldorado
27.2,4,141,71,3190,24.8,79,2,peugeot 504
23.9,8,260,90,3420,22.2,79,1,oldsmobile cutlass salon brougham
34.2,4,105,70,2200,13.2,79,1,plymouth horizon
34.5,4,105,70,2150,14.9,79,1,plymouth horizon tc3
31.8,4,85,65,2020,19.2,79,3,datsun 210
37.3,4,91,69,2130,14.7,79,2,fiat strada custom
28.4,4,151,90,2670,16,79,1,buick skylark limited
28.8,6,173,115,2595,11.3,79,1,chevrolet citation
26.8,6,173,115,2700,12.9,79,1,oldsmobile omega brougham
33.5,4,151,90,2556,13.2,79,1,pontiac phoenix
41.5,4,98,76,2144,14.7,80,2,vw rabbit
38.1,4,89,60,1968,18.8,80,3,toyota corolla tercel
32.1,4,98,70,2120,15.5,80,1,chevrolet chevette
37.2,4,86,65,2019,16.4,80,3,datsun 310
28,4,151,90,2678,16.5,80,1,chevrolet citation
26.4,4,140,88,2870,18.1,80,1,ford fairmont
24.3,4,151,90,3003,20.1,80,1,amc concord
19.1,6,225,90,3381,18.7,80,1,dodge aspen
34.3,4,97,78,2188,15.8,80,2,audi 4000
29.8,4,134,90,2711,15.5,80,3,toyota corona liftback
31.3,4,120,75,2542,17.5,80,3,mazda 626
37,4,119,92,2434,15,80,3,datsun 510 hatchback
32.2,4,108,75,2265,15.2,80,3,toyota corolla
46.6,4,86,65,2110,17.9,80,3,mazda glc
27.9,4,156,105,2800,14.4,80,1,dodge colt
40.8,4,85,65,2110,19.2,80,3,datsun 210
44.3,4,90,48,2085,21.7,80,2,vw rabbit c (diesel)
43.4,4,90,48,2335,23.7,80,2,vw dasher (diesel)
36.4,5,121,67,2950,19.9,80,2,audi 5000s (diesel)
30,4,146,67,3250,21.8,80,2,mercedes-benz 240d
44.6,4,91,67,1850,13.8,80,3,honda civic 1500 gl
40.9,4,85,?,1835,17.3,80,2,renault lecar deluxe
33.8,4,97,67,2145,18,80,3,subaru dl
29.8,4,89,62,1845,15.3,80,2,vokswagen rabbit
32.7,6,168,132,2910,11.4,80,3,datsun 280-zx
23.7,3,70,100,2420,12.5,80,3,mazda rx-7 gs
35,4,122,88,2500,15.1,80,2,triumph tr7 coupe
23.6,4,140,?,2905,14.3,80,1,ford mustang cobra
32.4,4,107,72,2290,17,80,3,honda accord
27.2,4,135,84,2490,15.7,81,1,plymouth reliant
26.6,4,151,84,2635,16.4,81,1,buick skylark
25.8,4,156,92,2620,14.4,81,1,dodge aries wagon (sw)
23.5,6,173,110,2725,12.6,81,1,chevrolet citation
30,4,135,84,2385,12.9,81,1,plymouth reliant
39.1,4,79,58,1755,16.9,81,3,toyota starlet
39,4,86,64,1875,16.4,81,1,plymouth champ
35.1,4,81,60,1760,16.1,81,3,honda civic 1300
32.3,4,97,67,2065,17.8,81,3,subaru
37,4,85,65,1975,19.4,81,3,datsun 210 mpg
37.7,4,89,62,2050,17.3,81,3,toyota tercel
34.1,4,91,68,1985,16,81,3,mazda glc 4
34.7,4,105,63,2215,14.9,81,1,plymouth horizon 4
34.4,4,98,65,2045,16.2,81,1,ford escort 4w
29.9,4,98,65,2380,20.7,81,1,ford escort 2h
33,4,105,74,2190,14.2,81,2,volkswagen jetta
34.5,4,100,?,2320,15.8,81,2,renault 18i
33.7,4,107,75,2210,14.4,81,3,honda prelude
32.4,4,108,75,2350,16.8,81,3,toyota corolla
32.9,4,119,100,2615,14.8,81,3,datsun 200sx
31.6,4,120,74,2635,18.3,81,3,mazda 626
28.1,4,141,80,3230,20.4,81,2,peugeot 505s turbo diesel
30.7,6,145,76,3160,19.6,81,2,volvo diesel
25.4,6,168,116,2900,12.6,81,3,toyota cressida
24.2,6,146,120,2930,13.8,81,3,datsun 810 maxima
22.4,6,231,110,3415,15.8,81,1,buick century
26.6,8,350,105,3725,19,81,1,oldsmobile cutlass ls
20.2,6,200,88,3060,17.1,81,1,ford granada gl
17.6,6,225,85,3465,16.6,81,1,chrysler lebaron salon
28,4,112,88,2605,19.6,82,1,chevrolet cavalier
27,4,112,88,2640,18.6,82,1,chevrolet cavalier wagon
34,4,112,88,2395,18,82,1,chevrolet cavalier 2-door
31,4,112,85,2575,16.2,82,1,pontiac j2000 se hatchback
29,4,135,84,2525,16,82,1,dodge aries se
27,4,151,90,2735,18,82,1,pontiac phoenix
24,4,140,92,2865,16.4,82,1,ford fairmont futura
23,4,151,?,3035,20.5,82,1,amc concord dl
36,4,105,74,1980,15.3,82,2,volkswagen rabbit l
37,4,91,68,2025,18.2,82,3,mazda glc custom l
31,4,91,68,1970,17.6,82,3,mazda glc custom
38,4,105,63,2125,14.7,82,1,plymouth horizon miser
36,4,98,70,2125,17.3,82,1,mercury lynx l
36,4,120,88,2160,14.5,82,3,nissan stanza xe
36,4,107,75,2205,14.5,82,3,honda accord
34,4,108,70,2245,16.9,82,3,toyota corolla
38,4,91,67,1965,15,82,3,honda civic
32,4,91,67,1965,15.7,82,3,honda civic (auto)
38,4,91,67,1995,16.2,82,3,datsun 310 gx
25,6,181,110,2945,16.4,82,1,buick century limited
38,6,262,85,3015,17,82,1,oldsmobile cutlass ciera (diesel)
26,4,156,92,2585,14.5,82,1,chrysler lebaron medallion
22,6,232,112,2835,14.7,82,1,ford granada l
32,4,144,96,2665,13.9,82,3,toyota celica gt
36,4,135,84,2370,13,82,1,dodge charger 2.2
27,4,151,90,2950,17.3,82,1,chevrolet camaro
27,4,140,86,2790,15.6,82,1,ford mustang gl
44,4,97,52,2130,24.6,82,2,vw pickup
32,4,135,84,2295,11.6,82,1,dodge rampage
28,4,120,79,2625,18.6,82,1,ford ranger
31,4,119,82,2720,19.4,82,1,chevy s-10

### Counting using .value_counts()
Often times you'll work with categorical values, and you'll want to count the number of observations each category has in a column. Category values can be counted using the .value_counts()



In [None]:
#create DataFrame
df = pd.DataFrame({'points': [9, 9, 9, 10, 10, 13, 15, 22],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': ['one', 'one', 'two', 'two', 'one', 'three', 'four', 'three']})
#count occurrences of unique values in 'points' column
df['points'].value_counts()

Unnamed: 0_level_0,count
points,Unnamed: 1_level_1
9,3
10,2
13,1
15,1
22,1


In [None]:
df = pd.read_csv('Titanic-Dataset.csv')
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [None]:
df['Embarked'].value_counts()

Unnamed: 0_level_0,count
Embarked,Unnamed: 1_level_1
S,644
C,168
Q,77


Adding the normalize argument returns proportions instead of absolute counts.

Turn off automatic sorting of results using sort argument (True by default). The default sorting is based on the counts in descending order.


In [None]:
df['Name'].value_counts(normalize=True,sort =True)

KeyError: ('Name', 'Embarked')

You can also apply .value_counts() to a DataFrame object and specific columns within it instead of just a column. Here, for example, we are applying value_counts() on df with the subset argument, which takes in a list of columns.

In [None]:
df.value_counts(subset=['rebounds', 'points'])

## Selection and Indexing

Let's learn how to retrieve information from a DataFrame.

In [None]:
df = pd.read_csv('tips.csv')

### COLUMNS

We will begin be learning how to extract information based on the columns

In [None]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


#### Grab a Single Column

In [None]:
df['Age']

Unnamed: 0,Age
0,22.0
1,38.0
2,26.0
3,35.0
4,35.0
...,...
886,27.0
887,19.0
888,
889,26.0


#### Grab Multiple Columns

In [None]:
# Note how its a python list of column names! Thus the double brackets.
df[['total_bill','tip']]

#### Create New Columns

In [None]:
df['tip_percentage'] = 100* df['tip'] / df['total_bill']

In [None]:
df.head()

In [None]:
df['price_per_person'] = df['total_bill'] / df['size']

In [None]:
df.head()

In [None]:
# Because pandas is based on numpy, we get awesome capabilities with numpy's universal functions!
df['price_per_person'] = np.round(df['price_per_person'],2)

In [None]:
df.head()

#### Remove Columns

In [None]:
df = df.drop(0,axis=0)

In [None]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,male,35.0,0,0,373450,8.05,,S
5,6,0,3,male,,0,0,330877,8.4583,,Q


# Index Basics

Before going over the same retrieval tasks for rows, let's build some basic understanding of the pandas DataFrame Index.

In [None]:
df.index

RangeIndex(start=1, stop=891, step=1)

In [None]:
df.set_index('Embarked')

Unnamed: 0_level_0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin
Embarked,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
C,2,1,1,female,38.0,1,0,PC 17599,71.2833,C85
S,3,1,3,female,26.0,0,0,STON/O2. 3101282,7.9250,
S,4,1,1,female,35.0,1,0,113803,53.1000,C123
S,5,0,3,male,35.0,0,0,373450,8.0500,
Q,6,0,3,male,,0,0,330877,8.4583,
...,...,...,...,...,...,...,...,...,...,...
S,887,0,2,male,27.0,0,0,211536,13.0000,
S,888,1,1,female,19.0,0,0,112053,30.0000,B42
S,889,0,3,female,,1,2,W./C. 6607,23.4500,
C,890,1,1,male,26.0,0,0,111369,30.0000,C148


In [None]:
df = df.reset_index()

In [None]:
df.head()

Unnamed: 0,index,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,2,1,1,female,38.0,1,0,PC 17599,71.2833,C85,C
1,2,3,1,3,female,26.0,0,0,STON/O2. 3101282,7.925,,S
2,3,4,1,1,female,35.0,1,0,113803,53.1,C123,S
3,4,5,0,3,male,35.0,0,0,373450,8.05,,S
4,5,6,0,3,male,,0,0,330877,8.4583,,Q


### ROWS

Let's now explore these same concepts but with Rows.

In [None]:
df.head()

In [1]:
df = df.set_index('Payment ID')

NameError: name 'df' is not defined

In [None]:
df.head()

#### Grab a Single Row

In [None]:
# Integer Based
df.iloc[-1]

Unnamed: 0,890
PassengerId,891
Survived,0
Pclass,3
Name,"Dooley, Mr. Patrick"
Sex,male
Age,32.0
SibSp,0
Parch,0
Ticket,370376
Fare,7.75


In [None]:
# Name Based
df.loc['Name']

KeyError: 'Name'

#### Grab Multiple Rows

In [None]:
df.iloc[0:4]

In [None]:
df.loc[['Sun2959','Sun5260']]

#### Remove Row

Typically are datasets will be large enough that we won't remove rows like this since we won't know thier row location for some specific condition, instead, we drop rows based on conditions such as missing data or column values. The next lecture will cover this in a lot more detail.

In [None]:
df.head()

In [None]:
df.drop('Sun2959',axis=0).head()

In [None]:
# Error if you have a named index!
# df.drop(0,axis=0).head()

### Conditional Filtering


In [None]:
bool_series = df['Age'] > 30
#bool_series = df['Sex'] != 'Male'

In [None]:
df[bool_series]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.5500,C103,S
...,...,...,...,...,...,...,...,...,...,...,...,...
873,874,0,3,"Vander Cruyssen, Mr. Victor",male,47.0,0,0,345765,9.0000,,S
879,880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56.0,0,1,11767,83.1583,C50,C
881,882,0,3,"Markun, Mr. Johann",male,33.0,0,0,349257,7.8958,,S
885,886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39.0,0,5,382652,29.1250,,Q


In [None]:
df[df['Sex'] == 'Male']

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked


## Multiple Conditions

Recall the steps:

* Get the conditions
* Wrap each condition in parenthesis
* Use the | or & operator, depending if you want an
    * OR | (either condition is True)
    * AND & (both conditions must be True)
* You can also use the ~ operator as a NOT operation

In [None]:
df[(df['total_bill'] > 30) & (df['sex']=='Male')]

## Conditional Operator isin()

We can use .isin() operator to filter by a list of options.

In [None]:
options = [1,2]
df['Survived'].isin(options)

Unnamed: 0,Survived
0,False
1,True
2,True
3,True
4,False
...,...
886,False
887,True
888,False
889,True


In [None]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## Useful Functions

<a id='apply_function'></a>
### apply with a function

In [None]:
def last_four(num):
    return str(num)[-4:]

In [None]:
df['CC Number'][0]

In [None]:
last_four(3560325168603410)

In [None]:
df['last_four'] = df['CC Number'].apply(last_four)

<a id='apply_lambda'></a>
### apply with lambda

In [None]:
def simple(num):
    return num*2

In [None]:
lambda num: num*2

In [None]:
df['total_bill'].apply(lambda bill:bill*0.18)

<a id='apply_multiple'></a>
## apply that uses multiple columns

Note, there are several ways to do this:

https://stackoverflow.com/questions/19914937/applying-function-with-multiple-arguments-to-create-a-new-pandas-column

In [None]:
def quality(total_bill,tip):
    if tip/total_bill  > 0.25:
        return "Generous"
    else:
        return "Other"

In [None]:
df['Tip Quality'] = df[['total_bill','tip']].apply(lambda df: quality(df['total_bill'],df['tip']),axis=1)

In [None]:
import numpy as np

In [None]:
df['Tip Quality'] = np.vectorize(quality)(df['total_bill'], df['tip'])

In [None]:
import timeit

# code snippet to be executed only once
setup = '''
import numpy as np
import pandas as pd
df = pd.read_csv('tips.csv')
def quality(total_bill,tip):
    if tip/total_bill  > 0.25:
        return "Generous"
    else:
        return "Other"
'''

# code snippet whose execution time is to be measured
stmt_one = '''
df['Tip Quality'] = df[['total_bill','tip']].apply(lambda df: quality(df['total_bill'],df['tip']),axis=1)
'''

stmt_two = '''
df['Tip Quality'] = np.vectorize(quality)(df['total_bill'], df['tip'])
'''


In [None]:
timeit.timeit(setup = setup,
                    stmt = stmt_one,
                    number = 1000)

In [None]:
timeit.timeit(setup = setup,
                    stmt = stmt_two,
                    number = 1000)

Wow! Vectorization is much faster! Keep **np.vectorize()** in mind for the future.

Full Details:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

<a id='apply_function'></a>
### apply with a function

In [None]:
def last_four(num):
    return str(num)[-4:]

In [None]:
df['CC Number'][0]

In [None]:
last_four(3560325168603410)

In [None]:
df['last_four'] = df['CC Number'].apply(last_four)

### Using .apply() with more complex functions

In [None]:
def yelp(price):
    if price < 10:
        return '$'
    elif price >= 10 and price < 30:
        return '$$'
    else:
        return '$$$'

In [None]:
df['Expensive'] = df['total_bill'].apply(yelp)

<a id='apply_lambda'></a>
### apply with lambda

In [None]:
def simple(num):
    return num*2

In [None]:
lambda num: num*2

In [None]:
df['total_bill'].apply(lambda bill:bill*0.18)

<a id='apply_multiple'></a>
## apply that uses multiple columns

Note, there are several ways to do this:

https://stackoverflow.com/questions/19914937/applying-function-with-multiple-arguments-to-create-a-new-pandas-column

In [None]:
def quality(total_bill,tip):
    if tip/total_bill  > 0.25:
        return "Generous"
    else:
        return "Other"

In [None]:
df['Tip Quality'] = df[['total_bill','tip']].apply(lambda df: quality(df['total_bill'],df['tip']),axis=1)

In [None]:
import numpy as np

In [None]:
df['Tip Quality'] = np.vectorize(quality)(df['total_bill'], df['tip'])

In [None]:
import timeit

# code snippet to be executed only once
setup = '''
import numpy as np
import pandas as pd
df = pd.read_csv('tips.csv')
def quality(total_bill,tip):
    if tip/total_bill  > 0.25:
        return "Generous"
    else:
        return "Other"
'''

# code snippet whose execution time is to be measured
stmt_one = '''
df['Tip Quality'] = df[['total_bill','tip']].apply(lambda df: quality(df['total_bill'],df['tip']),axis=1)
'''

stmt_two = '''
df['Tip Quality'] = np.vectorize(quality)(df['total_bill'], df['tip'])
'''


In [None]:
timeit.timeit(setup = setup,
                    stmt = stmt_one,
                    number = 1000)

In [None]:
timeit.timeit(setup = setup,
                    stmt = stmt_two,
                    number = 1000)

Wow! Vectorization is much faster! Keep **np.vectorize()** in mind for the future.

Full Details:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

<a id='bet'></a>
## between

left: A scalar value that defines the left boundary

right: A scalar value that defines the right boundary

inclusive: Inclusive has to be either string of 'both','left', 'right', or 'neither'.

In [None]:
df['Age'].between(30,40,exclusive='both')

TypeError: Series.between() got an unexpected keyword argument 'exclusive'

In [None]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


<a id='n'></a>
## nlargest and nsmallest

In [None]:
df.nlargest(10,'tip')

## Cleaning data using pandas

Data cleaning is one of the most common tasks in data science. pandas lets you preprocess data for any use, including but not limited to training machine learning and deep learning models. Let’s use the DataFrame `df2` from earlier, having four missing values, to illustrate a few data cleaning use cases.

Check how many missing values are in each column:

In [None]:
df.isnull().sum()

Unnamed: 0,0
PassengerId,0
Survived,0
Pclass,0
Name,0
Sex,0
Age,177
SibSp,0
Parch,0
Ticket,0
Fare,0



### Dealing with missing data technique #1: Dropping missing values

You can use the `.dropna()` method to remove missing data.

Drop rows with missing values:

In [None]:
df.shape

(891, 12)

In [None]:
df.dropna()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7000,G6,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.5500,C103,S
...,...,...,...,...,...,...,...,...,...,...,...,...
871,872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47.0,1,1,11751,52.5542,D35,S
872,873,0,1,"Carlsson, Mr. Frans Olof",male,33.0,0,0,695,5.0000,B51 B53 B55,S
879,880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56.0,0,1,11767,83.1583,C50,C
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S


In [None]:
df.isnull().sum()

Unnamed: 0,0
PassengerId,0
Survived,0
Pclass,0
Name,0
Sex,0
Age,177
SibSp,0
Parch,0
Ticket,0
Fare,0


In [None]:
df3 = df.copy()
df3 = df3.dropna()
df3.shape

(183, 12)

In [None]:
#Drop columns with missing values:

df3 = df.copy()
df3.dropna(inplace=True, axis=1)
df3.head()


In [None]:
#You can also drop rows or columns where all values are missing by setting the `how` argument to `'all'`.

df3 = df.copy()
df3.dropna(inplace=True, how='all')

----
------
## Note! Typical comparisons should be avoided with Missing Values

* https://towardsdatascience.com/navigating-the-hell-of-nans-in-python-71b12558895b
* https://stackoverflow.com/questions/20320022/why-in-numpy-nan-nan-is-false-while-nan-in-nan-is-true

This is generally because the logic here is, since we don't know these values, we can't know if they are equal to each other.

In [None]:
np.nan == np.nan

False

In [None]:
np.nan in [np.nan]

True

In [None]:
np.nan is np.nan

True

In [None]:
pd.NA == pd.NA

<NA>

#### Interpolating missing values

## Filling with Interpolation

Be careful with this technique, you should try to really understand whether or not this is a valid choice for your data. You should also note there are several methods available, the default is a linear method.

Full Docs on this Method:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html

In [None]:
airline_tix = {'first':100,'business':np.nan,'economy-plus':50,'economy':30}

In [None]:
ser = pd.Series(airline_tix)

In [None]:
ser

Unnamed: 0,0
first,100.0
business,
economy-plus,50.0
economy,30.0


In [None]:
ser.interpolate()

Unnamed: 0,0
first,100.0
business,75.0
economy-plus,50.0
economy,30.0


In [None]:
df = pd.DataFrame(ser,columns=['Price'])

In [None]:
df.interpolate()

In [None]:
df = df.reset_index()

In [None]:
df

In [None]:
df.interpolate(method='spline',order=2)

### Aggregating data with `.groupby()`

Pandas groupby is used for grouping the data according to the categories and applying a function to the categories. It also helps to aggregate data efficiently. The Pandas groupby() is a very powerful function with a lot of variations. It makes the task of splitting the Dataframe over some criteria really easy and efficient.

#### Adding an aggregate method call. To use a grouped object, you need to tell pandas how you want to aggregate the data.

Common Options:

    mean(): Compute mean of groups
    sum(): Compute sum of group values
    size(): Compute group sizes
    count(): Compute count of group
    std(): Standard deviation of groups
    var(): Compute variance of groups
    sem(): Standard error of the mean of groups
    describe(): Generates descriptive statistics
    first(): Compute first of group values
    last(): Compute last of group values
    nth() : Take nth value, or a subset if n is a list
    min(): Compute min of group values
    max(): Compute max of group values
    
Full List at the Online Documentation: https://pandas.pydata.org/docs/reference/groupby.html

In [None]:
df = pd.read_csv('mpg.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'mpg.csv'

In [None]:
df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [None]:
# Creates a groupby object waiting for an aggregate method
df.groupby('model_year')

KeyError: 'model_year'

In [None]:
# model_year becomes the index! It is NOT a column name,it is now the name of the index
df.groupby('model_year').sum()

KeyError: 'model_year'

In [None]:
df.groupby('model_year').describe().transpose()

## Groupby Multiple Columns
Let's explore total mpg per year per cylinder count

In [None]:
df.groupby(['model_year','cylinders']).sum()

## Data visulaization
All the columns of df can also be plotted on different scales and axes by using the subplots argument.



In [None]:

df.plot.line(subplots=True)

### More resources to get through the Pandas World

General Pandas Exercise
https://pynative.com/python-pandas-exercise/

Simple Exercise
https://nbviewer.jupyter.org/github/guipsamora/pandas_exercises/blob/master/01_Getting_%26_Knowing_Your_Data/Chipotle/Exercises.ipynb
https://nbviewer.jupyter.org/github/guipsamora/pandas_exercises/blob/master/01_Getting_%26_Knowing_Your_Data/Chipotle/Exercise_with_Solutions.ipynb

Filtering and Sorting Exercise
https://nbviewer.jupyter.org/github/guipsamora/pandas_exercises/blob/master/02_Filtering_%26_Sorting/Euro12/Exercises.ipynb

https://nbviewer.jupyter.org/github/guipsamora/pandas_exercises/blob/master/02_Filtering_%26_Sorting/Euro12/Exercises_with_Solutions.ipynb

GroupBy Exercise
https://nbviewer.jupyter.org/github/guipsamora/pandas_exercises/blob/master/03_Grouping/Alcohol_Consumption/Exercise.ipynb

https://nbviewer.jupyter.org/github/guipsamora/pandas_exercises/blob/master/03_Grouping/Alcohol_Consumption/Exercise_with_solutions.ipynb

Apply method
https://nbviewer.jupyter.org/github/guipsamora/pandas_exercises/blob/master/04_Apply/Students_Alcohol_Consumption/Exercises.ipynb

https://nbviewer.jupyter.org/github/guipsamora/pandas_exercises/blob/master/04_Apply/Students_Alcohol_Consumption/Exercises_with_solutions.ipynb

Merge
https://nbviewer.jupyter.org/github/guipsamora/pandas_exercises/blob/master/05_Merge/Auto_MPG/Exercises.ipynb

https://nbviewer.jupyter.org/github/guipsamora/pandas_exercises/blob/master/05_Merge/Auto_MPG/Exercises_with_solutions.ipynb

Data Vizualization
https://nbviewer.jupyter.org/github/guipsamora/pandas_exercises/blob/master/07_Visualization/Online_Retail/Exercises.ipynb

https://nbviewer.jupyter.org/github/guipsamora/pandas_exercises/blob/master/07_Visualization/Online_Retail/Exercises_with_solutions_code.ipynb