# MPG Cars

### Introduction:

The following exercise utilizes data from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np

### Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).  

   ### Step 3. Assign each to a variable called cars1 and cars2

In [2]:
cars1 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv")
cars2 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv")
print(cars1.head())
print(cars2.head())

    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70   
1  15.0          8           350        165    3693          11.5     70   
2  18.0          8           318        150    3436          11.0     70   
3  16.0          8           304        150    3433          12.0     70   
4  17.0          8           302        140    3449          10.5     70   

   origin                        car  Unnamed: 9  Unnamed: 10  Unnamed: 11  \
0       1  chevrolet chevelle malibu         NaN          NaN          NaN   
1       1          buick skylark 320         NaN          NaN          NaN   
2       1         plymouth satellite         NaN          NaN          NaN   
3       1              amc rebel sst         NaN          NaN          NaN   
4       1                ford torino         NaN          NaN          NaN   

   Unnamed: 12  Unnamed: 13  
0          NaN          NaN  
1          NaN

### Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

In [3]:
cars1 = cars1.loc[:, "mpg":"car"]
cars1.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino


### Step 5. What is the number of observations in each dataset?

In [4]:
print(cars1.shape[0])
print(cars2.shape[0])

198
200


### Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [5]:
cars = cars1.append(cars2)
cars

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl
196,44.0,4,97,52,2130,24.6,82,2,vw pickup
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage
198,28.0,4,120,79,2625,18.6,82,1,ford ranger


### Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [6]:
random_owners = np.random.randint(15000, high=73001, size=398, dtype='I')
random_owners

array([34170, 34909, 54055, 34043, 25038, 28732, 53161, 44811, 66568,
       22458, 38593, 41448, 46645, 60633, 27384, 70795, 53512, 16602,
       39811, 51686, 40738, 38951, 50895, 20967, 24182, 43829, 67732,
       69470, 33699, 39753, 53153, 21047, 52466, 18872, 52132, 48821,
       27268, 57631, 69421, 28813, 59902, 69581, 30162, 43262, 42852,
       31929, 36945, 57329, 42397, 29257, 30887, 61412, 31049, 57684,
       56612, 39655, 43319, 30207, 30951, 59582, 15371, 37445, 49378,
       36262, 38688, 59884, 44964, 60093, 27707, 48976, 57949, 19926,
       33727, 33908, 60208, 64845, 18840, 20123, 72491, 72143, 25898,
       65460, 64336, 47435, 23869, 60360, 56529, 21000, 16275, 67398,
       56718, 27264, 29215, 37697, 35900, 62678, 22193, 63359, 48426,
       37644, 58111, 17330, 69519, 30914, 34733, 26055, 17800, 46431,
       58867, 59133, 20091, 49784, 50541, 52635, 20411, 53664, 63624,
       52073, 72459, 19680, 56855, 67305, 59788, 30442, 47360, 44522,
       55589, 52450,

### Step 8. Add the column owners to cars

In [8]:
cars['owners'] = random_owners
cars.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car,owners
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu,34170
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320,34909
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite,54055
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst,34043
4,17.0,8,302,140,3449,10.5,70,1,ford torino,25038
