# IS4487 Week 2 - Practice Code

This notebook is designed to help you follow along with the **Week 2 Lecture and Reading**

The practice code demos are intended to give you a chance to see working code and can be a source for your lab and assignment work.  Each section contains short explanations and annotated code that reflect the steps in the reading.

### Topics for this demo:
- Importing data to a dataframe
- Filtering columns in a dataframe
- Filtering rows in a dataframe
- Aggregating data

<a href="https://colab.research.google.com/github/Stan-Pugsley/is_4487_base/blob/main/Demos/demo_02_dataframe_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

More information:
- Learn more about Colab here:  https://research.google.com/colaboratory/faq.html
- Learn more about Pandas here: https://pandas.pydata.org/docs/user_guide/10min.html


### Context: Motor Trend Car Road Tests
This example uses a small set of data from the 1970s with road tests from cars.  This is the classic dataset that statisticians have been using for the last 50 years to learn to work with data.  

| Column | Description                              |
| ------ | ---------------------------------------- |
| mpg    | Miles per gallon (fuel efficiency)       |
| cyl    | Number of cylinders                      |
| disp   | Displacement (cu. in.)                   |
| hp     | Gross horsepower                         |
| drat   | Rear axle ratio                          |
| wt     | Weight (1000 lbs)                        |
| qsec   | ¼ mile time                              |
| vs     | Engine type (0 = V-shaped, 1 = straight) |
| am     | Transmission (0 = automatic, 1 = manual) |
| gear   | Number of forward gears                  |
| carb   | Number of carburetors                    |


Your task is to import the data into a dataframe and learn to work with it as you would an Excel sheet.

### Import libraries

We will import two libraries
- Pandas, which is like Excel for Python.  It creates 2-dimensional data frames and lets you work with the rows and columns.  
- StatsModels has sample data for use in experimenting with Python

In [None]:
import pandas as pd
import statsmodels.api as sm

### Import Sample Data

Use the data from Lab 1

In [None]:
mtcars = sm.datasets.get_rdataset("mtcars", "datasets", cache=True).data
df = pd.DataFrame(mtcars)
print(df)

### Create Summary Statistics

We will use Pandas functions to preview the data

In [None]:
df.info()

In [None]:
df.describe()

### Work with the DataFrame

We will filter and reshape the dataset

In [None]:
#remove the Toyota Corolla row
df2 = df[df.index != 'Toyota Corolla']
print(df2)

In [None]:
#create a new dataframe with the first two columns
df3 = df2[['mpg', 'cyl']]
print(df3)

In [None]:

#sort the rows by mpg
df3.sort_values(by=['mpg'], inplace=True)
print(df3)

In [None]:
#aggregate the data to get the number of cars with each cylinder count
df4 = df3.groupby('cyl').size().reset_index(name='count')
print(df4)