# MPG Cars

Check out [Cars Exercises Video Tutorial](https://www.youtube.com/watch?v=avzLRBxoguU&list=PLgJhDSE2ZLxaY_DigHeiIDC1cD09rXgJv&index=3) to watch a data scientist go through the exercises

### Introduction:

The following exercise utilizes data from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np

### Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).  

   ### Step 3. Assign each to a to a variable called cars1 and cars2

In [2]:
cars1 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv")
cars2 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv")

print(cars1.head())
print(cars2.head())

    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70   
1  15.0          8           350        165    3693          11.5     70   
2  18.0          8           318        150    3436          11.0     70   
3  16.0          8           304        150    3433          12.0     70   
4  17.0          8           302        140    3449          10.5     70   

   origin                        car  Unnamed: 9  Unnamed: 10  Unnamed: 11  \
0       1  chevrolet chevelle malibu         NaN          NaN          NaN   
1       1          buick skylark 320         NaN          NaN          NaN   
2       1         plymouth satellite         NaN          NaN          NaN   
3       1              amc rebel sst         NaN          NaN          NaN   
4       1                ford torino         NaN          NaN          NaN   

   Unnamed: 12  Unnamed: 13  
0          NaN          NaN  
1          NaN

### Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

In [5]:
cars1 = cars1.loc[:,'mpg':'car']
cars1

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
193,24.0,6,200,81,3012,17.6,76,1,ford maverick
194,22.5,6,232,90,3085,17.6,76,1,amc hornet
195,29.0,4,85,52,2035,22.2,76,1,chevrolet chevette
196,24.5,4,98,60,2164,22.1,76,1,chevrolet woody


Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino


### Step 5. What is the number of observations in each dataset?

In [14]:
cars1.shape
cars2.shape

    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70   
1  15.0          8           350        165    3693          11.5     70   
2  18.0          8           318        150    3436          11.0     70   
3  16.0          8           304        150    3433          12.0     70   
4  17.0          8           302        140    3449          10.5     70   

   origin                        car  
0       1  chevrolet chevelle malibu  
1       1          buick skylark 320  
2       1         plymouth satellite  
3       1              amc rebel sst  
4       1                ford torino  
    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  33.0          4            91         53    1795          17.4     76   
1  20.0          6           225        100    3651          17.7     76   
2  18.0          6           250         78    3574          21.0     76   
3  18

### Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [16]:
cars = cars1.append(cars2)
cars

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl
196,44.0,4,97,52,2130,24.6,82,2,vw pickup
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage
198,28.0,4,120,79,2625,18.6,82,1,ford ranger


### Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [27]:
owners = np.random.randint(15000,high = 73000, size = 398,dtype = 'l')
owners

array([61382, 30625, 72300, 40139, 18008, 52132, 16994, 24264, 20149,
       62338, 52541, 27116, 26045, 34324, 57515, 68321, 56027, 28911,
       44707, 63683, 58167, 36520, 35665, 28661, 70226, 18788, 15369,
       32092, 15137, 52156, 59682, 35418, 27855, 72944, 15520, 48978,
       37456, 41640, 61352, 48034, 62317, 36381, 16195, 49673, 15528,
       31888, 65149, 16741, 51306, 63455, 30260, 26583, 43024, 46504,
       72704, 38752, 55736, 52447, 63979, 19025, 43696, 37620, 41197,
       51179, 21194, 49777, 53677, 18707, 44156, 65999, 25361, 63534,
       17257, 62089, 52815, 25782, 48754, 35447, 29848, 70952, 19960,
       54195, 65019, 70527, 20480, 42148, 72876, 39914, 34106, 51100,
       18206, 18449, 44214, 66840, 36546, 17440, 38535, 64359, 58141,
       50259, 34279, 33883, 15361, 43508, 25412, 25215, 69948, 23315,
       17310, 65780, 62762, 43517, 50261, 48354, 40096, 71003, 21393,
       22093, 19669, 19109, 63330, 62652, 42799, 52084, 31952, 68375,
       62067, 42585,

### Step 8. Add the column owners to cars

In [28]:
cars['Own_code'] = owners

In [29]:
cars.head(100)

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car,Own_code
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu,61382
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320,30625
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite,72300
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst,40139
4,17.0,8,302,140,3449,10.5,70,1,ford torino,18008
...,...,...,...,...,...,...,...,...,...,...
95,12.0,8,455,225,4951,11.0,73,1,buick electra 225 custom,17440
96,13.0,8,360,175,3821,11.0,73,1,amc ambassador brougham,38535
97,18.0,6,225,105,3121,16.5,73,1,plymouth valiant,64359
98,16.0,6,250,100,3278,18.0,73,1,chevrolet nova custom,58141
