# MPG Cars

Check out [Cars Exercises Video Tutorial](https://www.youtube.com/watch?v=avzLRBxoguU&list=PLgJhDSE2ZLxaY_DigHeiIDC1cD09rXgJv&index=3) to watch a data scientist go through the exercises

### Introduction:

The following exercise utilizes data from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np

### Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).  

   ### Step 3. Assign each to a to a variable called cars1 and cars2

In [2]:
cars1 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv")
cars2 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv")

print(cars1.head())
print(cars2.head())

    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70   
1  15.0          8           350        165    3693          11.5     70   
2  18.0          8           318        150    3436          11.0     70   
3  16.0          8           304        150    3433          12.0     70   
4  17.0          8           302        140    3449          10.5     70   

   origin                        car  Unnamed: 9  Unnamed: 10  Unnamed: 11  \
0       1  chevrolet chevelle malibu         NaN          NaN          NaN   
1       1          buick skylark 320         NaN          NaN          NaN   
2       1         plymouth satellite         NaN          NaN          NaN   
3       1              amc rebel sst         NaN          NaN          NaN   
4       1                ford torino         NaN          NaN          NaN   

   Unnamed: 12  Unnamed: 13  
0          NaN          NaN  
1          NaN

### Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

In [3]:
cars1 = cars1.loc[:, "mpg":"car"]
cars1.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino


### Step 5. What is the number of observations in each dataset?

In [4]:
print(cars1.shape)
print(cars2.shape)

(198, 9)
(200, 9)


### Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [26]:
# cars = cars1.append(cars2)
# cars
cars = pd.concat([cars1, cars2], ignore_index=True)
cars

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
393,27.0,4,140,86,2790,15.6,82,1,ford mustang gl
394,44.0,4,97,52,2130,24.6,82,2,vw pickup
395,32.0,4,135,84,2295,11.6,82,1,dodge rampage
396,28.0,4,120,79,2625,18.6,82,1,ford ranger


### Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [22]:
nr_owners = np.random.randint(15000, high=73001, size=398, dtype='l')
nr_owners

array([50848, 22511, 50815, 34869, 19335, 59550, 33280, 19992, 44653,
       55399, 53063, 48496, 70066, 19401, 27463, 59553, 39547, 59385,
       47215, 31526, 29960, 67758, 71206, 26723, 68495, 39453, 21659,
       50058, 21848, 50580, 44420, 54230, 50043, 66552, 64966, 47064,
       40411, 35374, 55149, 24896, 72230, 56164, 15163, 40935, 68019,
       46523, 53362, 19540, 63813, 48095, 49916, 67673, 61722, 22490,
       55741, 15813, 56006, 15202, 61658, 42772, 66271, 21386, 29119,
       47278, 30120, 50618, 33807, 27613, 45580, 65472, 33606, 71959,
       37503, 37428, 69231, 64013, 70082, 38339, 41192, 43151, 39773,
       53920, 71910, 35148, 68583, 58757, 48243, 55273, 34704, 20316,
       55338, 34467, 25669, 65850, 68943, 39088, 52525, 27311, 23921,
       40146, 28581, 61712, 45760, 22255, 55725, 15938, 55674, 67202,
       48767, 65625, 18668, 53419, 32469, 62272, 72289, 34119, 27126,
       57825, 28705, 23999, 43020, 16902, 67446, 62264, 68740, 16463,
       51654, 70781,

### Step 8. Add the column owners to cars

In [23]:
cars['owners'] = nr_owners
cars.tail()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car,owners
393,27.0,4,140,86,2790,15.6,82,1,ford mustang gl,64475
394,44.0,4,97,52,2130,24.6,82,2,vw pickup,50098
395,32.0,4,135,84,2295,11.6,82,1,dodge rampage,41333
396,28.0,4,120,79,2625,18.6,82,1,ford ranger,38290
397,31.0,4,119,82,2720,19.4,82,1,chevy s-10,31421
