# MPG Cars

Check out [Cars Exercises Video Tutorial](https://www.youtube.com/watch?v=avzLRBxoguU&list=PLgJhDSE2ZLxaY_DigHeiIDC1cD09rXgJv&index=3) to watch a data scientist go through the exercises

### Introduction:

The following exercise utilizes data from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np

### Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).  

   ### Step 3. Assign each to a to a variable called cars1 and cars2

In [3]:
cars1 = pd.read_csv('https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv',)
cars2 = pd.read_csv('https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv')
print(cars1.head())
print(cars2.head())

    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70   
1  15.0          8           350        165    3693          11.5     70   
2  18.0          8           318        150    3436          11.0     70   
3  16.0          8           304        150    3433          12.0     70   
4  17.0          8           302        140    3449          10.5     70   

   origin                        car  Unnamed: 9  Unnamed: 10  Unnamed: 11  \
0       1  chevrolet chevelle malibu         NaN          NaN          NaN   
1       1          buick skylark 320         NaN          NaN          NaN   
2       1         plymouth satellite         NaN          NaN          NaN   
3       1              amc rebel sst         NaN          NaN          NaN   
4       1                ford torino         NaN          NaN          NaN   

   Unnamed: 12  Unnamed: 13  
0          NaN          NaN  
1          NaN

### Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

In [4]:
cars1 = cars1.loc[:, "mpg":"car"]
cars1.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino


### Step 5. What is the number of observations in each dataset?

In [5]:
print(cars1.shape)
print(cars2.shape)

(198, 9)
(200, 9)


### Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [7]:
cars = cars1.append(cars2)
cars

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl
196,44.0,4,97,52,2130,24.6,82,2,vw pickup
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage
198,28.0,4,120,79,2625,18.6,82,1,ford ranger


### Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [8]:
nr_owners = np.random.randint(15000, high=73001, size=398, dtype='l')
nr_owners

array([49522, 58692, 27045, 56222, 50889, 53621, 64713, 71006, 50414,
       58635, 61237, 53508, 63122, 40054, 65868, 71371, 58949, 42157,
       53813, 65499, 32309, 51033, 70834, 41781, 56549, 35315, 26478,
       40523, 30544, 51705, 44971, 72260, 45551, 59737, 19950, 70862,
       57663, 20509, 33769, 24086, 67883, 55968, 16872, 41877, 17828,
       36860, 18750, 60223, 27909, 66467, 48923, 30740, 65042, 58516,
       49926, 48106, 58749, 22475, 70951, 23177, 41048, 60633, 62974,
       45205, 61468, 37055, 25588, 44835, 32108, 54647, 68946, 53297,
       46035, 38963, 21981, 21106, 62347, 59606, 38477, 51134, 62584,
       32136, 27774, 16421, 36766, 43174, 18896, 16403, 63886, 20614,
       64519, 66515, 30303, 55130, 32033, 68882, 50718, 31950, 30583,
       16109, 32270, 57254, 28182, 24243, 71638, 16466, 51972, 66402,
       53523, 56905, 57449, 43747, 33890, 22312, 67331, 31297, 56738,
       66973, 64369, 36420, 44113, 31121, 18910, 35150, 62179, 60952,
       32057, 67851,

### Step 8. Add the column owners to cars

In [9]:
cars['owners'] = nr_owners
cars

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car,owners
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu,49522
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320,58692
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite,27045
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst,56222
4,17.0,8,302,140,3449,10.5,70,1,ford torino,50889
...,...,...,...,...,...,...,...,...,...,...
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl,65267
196,44.0,4,97,52,2130,24.6,82,2,vw pickup,42272
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage,44892
198,28.0,4,120,79,2625,18.6,82,1,ford ranger,53438
