# MPG Cars

### Introduction:

The following exercise utilizes data from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

### Step 1. Import the necessary libraries

In [2]:
import pandas as pd
import numpy as np

### Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).  

In [3]:
url_cars1 ='https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv'
url_cars2='https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv'

   ### Step 3. Assign each to a variable called cars1 and cars2

In [4]:
cars1 = pd.read_csv(url_cars1)
cars2 = pd.read_csv(url_cars2)

print("Cars1 Head:")
print(cars1.head())

Cars1 Head:
    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70   
1  15.0          8           350        165    3693          11.5     70   
2  18.0          8           318        150    3436          11.0     70   
3  16.0          8           304        150    3433          12.0     70   
4  17.0          8           302        140    3449          10.5     70   

   origin                        car  Unnamed: 9  Unnamed: 10  Unnamed: 11  \
0       1  chevrolet chevelle malibu         NaN          NaN          NaN   
1       1          buick skylark 320         NaN          NaN          NaN   
2       1         plymouth satellite         NaN          NaN          NaN   
3       1              amc rebel sst         NaN          NaN          NaN   
4       1                ford torino         NaN          NaN          NaN   

   Unnamed: 12  Unnamed: 13  
0          NaN          NaN  
1 

### Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

In [None]:

unnamed_cols = [col for col in cars1.columns if 'Unnamed']
cars1 = cars1.drop(columns=unnamed_cols)

print("\nCars1 Head (Fixed):")
print(cars1.head())


Cars1 Head (Fixed):
    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70   
1  15.0          8           350        165    3693          11.5     70   
2  18.0          8           318        150    3436          11.0     70   
3  16.0          8           304        150    3433          12.0     70   
4  17.0          8           302        140    3449          10.5     70   

   origin                        car  
0       1  chevrolet chevelle malibu  
1       1          buick skylark 320  
2       1         plymouth satellite  
3       1              amc rebel sst  
4       1                ford torino  


### Step 5. What is the number of observations in each dataset?

In [6]:
num_cars1 = len(cars1)
num_cars2 = len(cars2)

print(f"\nNumber of observations in cars1: {num_cars1}")
print(f"Number of observations in cars2: {num_cars2}")


Number of observations in cars1: 198
Number of observations in cars2: 200


### Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [7]:
cars = pd.concat([cars1, cars2], ignore_index=True)

print("\nMerged 'cars' DataFrame Head:")
print(cars.head())
print("\nMerged 'cars' DataFrame Tail:")
print(cars.tail())


Merged 'cars' DataFrame Head:
    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70   
1  15.0          8           350        165    3693          11.5     70   
2  18.0          8           318        150    3436          11.0     70   
3  16.0          8           304        150    3433          12.0     70   
4  17.0          8           302        140    3449          10.5     70   

   origin                        car  
0       1  chevrolet chevelle malibu  
1       1          buick skylark 320  
2       1         plymouth satellite  
3       1              amc rebel sst  
4       1                ford torino  

Merged 'cars' DataFrame Tail:
      mpg  cylinders  displacement horsepower  weight  acceleration  model  \
393  27.0          4           140         86    2790          15.6     82   
394  44.0          4            97         52    2130          24.6     82   
395  32.0    

### Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [8]:

total_obs = len(cars)
owners = pd.Series(np.random.randint(low=15000, high=73000, size=total_obs))

print("\n'owners' Series Head:")
print(owners.head())


'owners' Series Head:
0    58206
1    45316
2    50651
3    37496
4    54075
dtype: int32


### Step 8. Add the column owners to cars

In [9]:
cars['owners'] = owners

print("\nFinal 'cars' DataFrame with 'owners' column:")
print(cars.tail())


Final 'cars' DataFrame with 'owners' column:
      mpg  cylinders  displacement horsepower  weight  acceleration  model  \
393  27.0          4           140         86    2790          15.6     82   
394  44.0          4            97         52    2130          24.6     82   
395  32.0          4           135         84    2295          11.6     82   
396  28.0          4           120         79    2625          18.6     82   
397  31.0          4           119         82    2720          19.4     82   

     origin              car  owners  
393       1  ford mustang gl   70484  
394       2        vw pickup   18159  
395       1    dodge rampage   54185  
396       1      ford ranger   15728  
397       1       chevy s-10   67152  
