# Lab 2: Pandas and numpy

In [1]:
import numpy as np
import pandas as pd

## EXERCISE 1: How fast is your car?

Read in automobile data from a CSV file, storing the data in a dataframe `dfcars`.

````` 
Description
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and ten aspects of automobile design and performance for 32 automobiles (1973–74 models).

Format

A data frame with 32 observations on 11 variables.

[, 1]	mpg	     Miles/(US) gallon
[, 2]	cyl	     Number of cylinders
[, 3]	disp	 Displacement (cu.in.)
[, 4]	hp	     Gross horsepower
[, 5]	drat	 Rear axle ratio
[, 6]	wt	     Weight (1000 lbs)
[, 7]	qsec	 1/4 mile time
[, 8]	vs	     V/S
[, 9]	am	     Transmission (0 = automatic, 1 = manual)
[,10]	gear	 Number of forward gears
[,11]	carb	 Number of carburetors
        
Source
Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
`````

In [2]:
!wget https://raw.githubusercontent.com/bitanb1999/NumericalPython/main/data/mtcars.csv

--2021-01-15 12:56:12--  https://raw.githubusercontent.com/bitanb1999/NumericalPython/main/data/mtcars.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1783 (1.7K) [text/plain]
Saving to: ‘mtcars.csv’


2021-01-15 12:56:12 (33.2 MB/s) - ‘mtcars.csv’ saved [1783/1783]



In [4]:
dfcars=pd.read_csv("mtcars.csv")
dfcars.head()

Unnamed: 0.1,Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


That first column is messed up. Lets fix it.

The first column, which seems to be the name of the car, does not have a name. Here are the first 3 lines of the file:

```
"","mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
"Mazda RX4",21,6,160,110,3.9,2.62,16.46,0,1,4,4
"Mazda RX4 Wag",21,6,160,110,3.9,2.875,17.02,0,1,4,4
```

In [5]:
dfcars = dfcars.rename(columns={"Unnamed: 0": "name"})
dfcars.head()

Unnamed: 0,name,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


Columns in a dataframe (series) come with their own types. Some data may be categorical, that is, they come  with only few well defined values. An example is cylinders  (`cyl`). Cars may be 4, 6, or 8 cylindered. There is a ordered interpretation to this  (8 cylinders more powerful engine than 6 cylinders) but also a one-of-three-types interpretation to this. 

Sometimes categorical data does not have an ordered interpretation. An example is `am`: a boolean variable which indicates whether the car is an automatic or not.

Other column types are integer, floating-point, and `object`. The latter is a catch-all for a string or anything Pandas cannot infer, for example, a column that contains data of mixed types. 

Write code to check the datatypes of the columns

In [12]:
dfcars.dtypes


name     object
mpg     float64
cyl       int64
disp    float64
hp        int64
drat    float64
wt      float64
qsec    float64
vs        int64
am        int64
gear      int64
carb      int64
dtype: object

Write code to get the unique values of car cylinders

In [13]:
dfcars.cyl.unique()


array([6, 4, 8])

Write code to get the dataframe of the 6 and 8 cylinder cars

In [28]:
df=dfcars[dfcars.cyl.isin([6,8])]
df.head()


Unnamed: 0,name,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
3,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
5,Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1


Write code to get the dataframe of cars with greter than 30mpg speeds

In [29]:
dfcars[dfcars.mpg > 30]

Unnamed: 0,name,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
17,Fiat 128,32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
18,Honda Civic,30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
19,Toyota Corolla,33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
27,Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2


Save the high speed cars in a new csv file in data folder, ie in the file "data/highspeed.csv"

In [39]:
dfcars[dfcars.mpg > 30].to_csv("highspeed.csv",header=True,index='name')
df=pd.read_csv("highspeed.csv")
df.head()
df.rename(columns={"Unname:0":"number"})

Unnamed: 0.1,Unnamed: 0,name,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,17,Fiat 128,32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
1,18,Honda Civic,30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
2,19,Toyota Corolla,33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
3,27,Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2


Create a 2-D numpy array from the mpg and wt columns of `dfcars`

In [41]:
dfcars[['mpg','wt']].values
type(dfcars[['mpg','wt']])

pandas.core.frame.DataFrame

Reshape this array into a wide style, in other words, transpose it

In [42]:
dfcars[['mpg','wt']].values.T


array([[21.   , 21.   , 22.8  , 21.4  , 18.7  , 18.1  , 14.3  , 24.4  ,
        22.8  , 19.2  , 17.8  , 16.4  , 17.3  , 15.2  , 10.4  , 10.4  ,
        14.7  , 32.4  , 30.4  , 33.9  , 21.5  , 15.5  , 15.2  , 13.3  ,
        19.2  , 27.3  , 26.   , 30.4  , 15.8  , 19.7  , 15.   , 21.4  ],
       [ 2.62 ,  2.875,  2.32 ,  3.215,  3.44 ,  3.46 ,  3.57 ,  3.19 ,
         3.15 ,  3.44 ,  3.44 ,  4.07 ,  3.73 ,  3.78 ,  5.25 ,  5.424,
         5.345,  2.2  ,  1.615,  1.835,  2.465,  3.52 ,  3.435,  3.84 ,
         3.845,  1.935,  2.14 ,  1.513,  3.17 ,  2.77 ,  3.57 ,  2.78 ]])

In [43]:
myarr=dfcars[['mpg','wt']].values
myarr.reshape(myarr.shape[1],myarr.shape[0])


array([[21.   ,  2.62 , 21.   ,  2.875, 22.8  ,  2.32 , 21.4  ,  3.215,
        18.7  ,  3.44 , 18.1  ,  3.46 , 14.3  ,  3.57 , 24.4  ,  3.19 ,
        22.8  ,  3.15 , 19.2  ,  3.44 , 17.8  ,  3.44 , 16.4  ,  4.07 ,
        17.3  ,  3.73 , 15.2  ,  3.78 , 10.4  ,  5.25 , 10.4  ,  5.424],
       [14.7  ,  5.345, 32.4  ,  2.2  , 30.4  ,  1.615, 33.9  ,  1.835,
        21.5  ,  2.465, 15.5  ,  3.52 , 15.2  ,  3.435, 13.3  ,  3.84 ,
        19.2  ,  3.845, 27.3  ,  1.935, 26.   ,  2.14 , 30.4  ,  1.513,
        15.8  ,  3.17 , 19.7  ,  2.77 , 15.   ,  3.57 , 21.4  ,  2.78 ]])