## Predictions: Regression for Car Mileage and Diamond Price


## Objectives

 - Use Pandas to load data sets.
 - Identify the target and features.
 - Use Linear Regression to build a model to predict car mileage.
 - Use metrics to evaluate the model.
 - Make predictions using a trained model.


## Datasets




In this lab you will be using dataset(s):

 - Modified version of car mileage dataset. Available at https://archive.ics.uci.edu/ml/datasets/auto+mpg 
 - Modified version of diamonds dataset. Available at https://www.openml.org/search?type=data&sort=runs&id=42225&status=active
 

----


## Setup




*   [`pandas`](https://pandas.pydata.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for managing the data.
*   [`sklearn`](https://scikit-learn.org/stable/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for machine learning and machine-learning-pipeline related functions.


### Installing Required Libraries

The following required libraries are pre-installed in the Skills Network Labs environment. However, if you run this notebook commands in a different Jupyter environment (e.g. Watson Studio or Ananconda), you will need to install these libraries by removing the `#` sign before `!pip` in the code cell below.


In [None]:
!pip install pandas==1.3.4
!pip install scikit-learn==1.0.2
!pip install numpy==1.21.6

In [None]:
# You can also use this section to suppress warnings generated by your code:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

### Importing Required Libraries


In [None]:
import pandas as pd
from sklearn.linear_model import LinearRegression

# Car Mileage Prediction

The purpose is to show you how to use a car data set to train a regression model that will predict the mileage of a car

## Task 1 - Load the data in a csv file into a dataframe


In [None]:
# the data set is available at the url below.
URL = "https://blahblahblah.cloudstorage/datasets/mpg.csv"

# using the read_csv function in the pandas library, we load the data into a dataframe.

df = pd.read_csv(URL)

Let's look at some sample rows from the dataset we loaded:


In [None]:
# show 5 random rows from the dataset
df.sample(5)

Let's find out the number of rows and columns in the dataset:


In [None]:
df.shape

Let's create a scatter plot of Horsepower versus mileage(MPG) . This will help us visualize the relationship between them.


In [None]:
df.plot.scatter(x = "Horsepower", y = "MPG")

You are encouraged to create more plots to visualize relationships amongst other columns


## Task 2 - Identify the target column and the data columns


First we identify the target. Target is the value that our machine learning model needs to predict


In [None]:
target = df["MPG"]

We identify the features next. Features are the values our machine learning model learns from


In [None]:
features = df[["Horsepower","Weight"]]

## Task 3 - Build and Train a Linear Regression Model


Create a LR model


In [None]:
lr = LinearRegression()

Train/Fit the model


In [None]:
lr.fit(features,target)

## Task 4 - Evaluate the model and make predictions


Your model is now trained. Time to evaluate the model.


In [None]:
#Higher the score, better the model.
lr.score(features,target)

Make predictions. Let us predict the mileage for a car with HorsePower = 100 and Weight = 2000


In [None]:
lr.predict([[100,2000]])

In [None]:
# 29.3216098 miles per gallon is the mileage of a car with HorsePower = 100 and Weight = 2000

# Diamond Price Prediction

In [None]:
URL2 = "https://blahblahblah.cloudstorage/datasets/diamonds.csv"

### Exercise 1 - Loading a dataset (diamond dataset)


In [None]:
df2 = pd.read_csv(URL2)

In [None]:
df2.sample(5)

### Exercise 2 - Identify the target column and the data columns


 - use the price column as target
 - use the columns carat and depth as features


In [None]:
target2 = df2["price"]

features2 = df2[["carat","depth"]]

### Exercise 3 - Build and Train a new Linear Regression Model


Create a new Linear Regression Model


In [None]:
lr2 = LinearRegression()
lr2.fit(features2, target2)

<details>
    <summary>Click here for a Hint</summary>
    
fit using the features and target
</details>


<details>
    <summary>Click here for Solution</summary>

```python
lr2 = LinearRegression()
lr2.fit(features,target)

```

</details>


### Exercise 4 - Evaluate the model


Print the score of the model


In [None]:
#your code goes here

lr2.score(features2, target2)

### Exercise 5 - Predict the price of a diamond


Predict the price of a diamond with carat = 0.3 and depth = 60


In [None]:
#your code goes here

lr2.predict([[0.3, 60]])