# Ex - Merge

# MPG Cars

### Introduction:

The following exercise utilizes data from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

### Step 1. Import the necessary libraries

In [3]:
import pandas as pd
import numpy as np

### Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).  

   ### Step 3. Assign each to a to a variable called cars1 and cars2

In [4]:
cars1 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv")
cars2 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv")

display(cars1.head())
display(cars2.head())

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu,,,,,
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320,,,,,
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite,,,,,
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst,,,,,
4,17.0,8,302,140,3449,10.5,70,1,ford torino,,,,,


Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,33.0,4,91,53,1795,17.4,76,3,honda civic
1,20.0,6,225,100,3651,17.7,76,1,dodge aspen se
2,18.0,6,250,78,3574,21.0,76,1,ford granada ghia
3,18.5,6,250,110,3645,16.2,76,1,pontiac ventura sj
4,17.5,6,258,95,3193,17.8,76,1,amc pacer d/l


### Step 4. Ops it seems our first dataset has some unnamed blank columns, fix cars1

In [5]:
cars1 = cars1.loc[:, "mpg":"car"]
cars1.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino


### Step 5. What is the number of observations in each dataset?

In [6]:
print(cars1.shape)
print(cars2.shape)

(198, 9)
(200, 9)


### Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [7]:
cars = cars1.append(cars2)
cars

  cars = cars1.append(cars2)


Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl
196,44.0,4,97,52,2130,24.6,82,2,vw pickup
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage
198,28.0,4,120,79,2625,18.6,82,1,ford ranger


### Step 7. Ops there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [8]:
nr_owners = np.random.randint(15000, high=73001, size=398, dtype='l')
nr_owners

array([56346, 62634, 59108, 72489, 48760, 35559, 17569, 17874, 40382,
       37277, 62408, 18287, 37816, 59962, 44182, 55583, 62432, 69486,
       29618, 38511, 52092, 53507, 35099, 32235, 24445, 45657, 47479,
       54066, 21092, 26394, 53063, 47815, 68531, 21191, 15641, 23285,
       66851, 62193, 53096, 70614, 49402, 55932, 26239, 52730, 70112,
       65475, 32958, 18146, 24542, 68079, 70675, 17548, 28614, 19038,
       24900, 53841, 69286, 58458, 44717, 54621, 27870, 41333, 48152,
       42373, 44462, 51459, 48502, 63241, 23660, 19487, 41182, 58449,
       58653, 21485, 49069, 42306, 35962, 33958, 43282, 33545, 19760,
       42832, 20513, 20541, 31334, 28734, 64870, 16965, 56352, 42023,
       52461, 57316, 40295, 33179, 29588, 45761, 34232, 35598, 19257,
       31757, 50663, 27965, 20336, 72988, 52010, 33209, 42629, 15542,
       70386, 22900, 18092, 53878, 47339, 56011, 18050, 38079, 16536,
       68741, 61928, 36811, 71576, 36692, 33062, 15248, 15124, 65921,
       31942, 38571,

### Step 8. Add the column owners to cars

In [9]:
cars['owners'] = nr_owners
cars.tail()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car,owners
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl,22820
196,44.0,4,97,52,2130,24.6,82,2,vw pickup,24922
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage,47752
198,28.0,4,120,79,2625,18.6,82,1,ford ranger,70694
199,31.0,4,119,82,2720,19.4,82,1,chevy s-10,19171
