<h1 align=left style="line-height:200%;color:#0099cc">
<font color="#0099cc">
Pandas Exercise 2
</br>
Exhausted Cyclists
</font>
</h1>
<p dir=ltr style="direction: ltr;text-align: justify;line-height:200%;font-size:medium">
<font size=3>
Cyclists are very sensitive to temperature. The temperature that cyclists feel is usually directly related to wind speed and humidity. In this exercise, we want to help a bike rental business determine how many more bicycles to rent on different days and at different temperatures based on a dataset of temperatures.
</font>
</p>


<h2 align=left style="line-height:200%;color:#0099cc">
<font color="#0099cc">
Dataset
</font>
</h2>

<p dir=ltr style="direction: ltr;text-align: justify;line-height:200%;font-size:medium">
<font size=3>
By running the cell below, you can read the data for this exercise into a DataFrame. This dataset includes the following columns:
</font>
</p>


<center>
<div dir=ltr style="direction: ltr;line-height:200%;font-size:medium">
<font size=3>

| <b>Group Number</b> | <b>Description</b> |
| :---: | :---: |
| <code>cnt</code> | Number of Bicycles Rented per Day|
| <code>t1</code> | Actual Temperature Recorded on That Day|
| <code>t2</code> | Average Temperature Felt by Cyclists|
| <code>humidity</code> | Humidity on That Day|
| <code>wind_speed</code> | Wind Speed on That Day|
|‌ <code>is_weekend</code> | Is it a Non-Working Day (Weekend)?|
| <code>season</code> | Season Number|

</font>
</div>
</center>

In [None]:
import pandas as pd
import numpy as np
from numpy import nan as NA

df = pd.read_csv('bikes_borrowed.csv')
df.head()

Unnamed: 0,cnt,t1,t2,humidity,wind_speed,is_weekend,season
0,182,3.0,2.0,93.0,6.0,1.0,3.0
1,138,3.0,2.5,93.0,5.0,1.0,3.0
2,134,2.5,2.5,96.5,0.0,1.0,3.0
3,72,2.0,2.0,100.0,0.0,1.0,3.0
4,47,2.0,0.0,93.0,6.5,1.0,3.0


<h2 align=left style="line-height:200%;color:#0099cc">
<font color="#0099cc">
Part One
</font>
</h2>


<p dir=ltr style="direction: ltr;text-align: justify;line-height:200%;font-size:medium">
<font size=3>
As you can see, the names of the columns t1 and t2 are very general and do not indicate the meaning of their values. For this reason, it is good to change their names.
In the cell below, change the name of the column 't1' to 't_real' and the name of the column 't2' to 't_feels_like'.
To change the names of indices or columns, we can use the `rename` function, as shown in the following code:
<br>
</font>
</p>

`df.rename(columns={"col1": "new_col1", "col2": "new_col2"} , inplace = True)`


In [None]:
df.rename(columns={"t1": "t_real", "t2": "t_feels_like"}, inplace=True)

df.head()

Unnamed: 0,cnt,t_real,t_feels_like,humidity,wind_speed,is_weekend,season
0,182,3.0,2.0,93.0,6.0,1.0,3.0
1,138,3.0,2.5,93.0,5.0,1.0,3.0
2,134,2.5,2.5,96.5,0.0,1.0,3.0
3,72,2.0,2.0,100.0,0.0,1.0,3.0
4,47,2.0,0.0,93.0,6.5,1.0,3.0


<p dir=ltr style="direction: ltr;text-align: justify;line-height:200%;font-size:medium">
<font size=3>
Pay attention to the parameter `inplace=True`; the presence of this parameter causes the `rename` function to apply changes directly to the original DataFrame instead of creating a copy and returning that copy after applying the changes.

For a better understanding of this matter, you can once remove this parameter and then execute `df.head()` again to check whether the column names have changed or not.
</font>
</p>


<h2 align=left style="line-height:200%;color:#0099cc">
<font color="#0099cc">
Part Two
</font>
</h2>

<p dir=ltr style="direction: ltr;text-align: justify;line-height:200%;font-size:medium">
<font  size=3>
Since many days when cyclists complain about the weather are windy days, we decided to perform calculations only for days with a wind speed greater than 10. Therefore, in the DataFrame `windy_days_df`, store only the data for which the wind speed is greater than 10 (do not include 10 itself).
</font>
</p>

In [None]:
windy_days_df = df[df.wind_speed > 10]

windy_days_df.head()

Unnamed: 0,cnt,t_real,t_feels_like,humidity,wind_speed,is_weekend,season
10,528,3.0,-0.5,93.0,12.0,1.0,3.0
11,727,2.0,-1.5,100.0,12.0,1.0,3.0
12,862,2.0,-1.5,96.5,13.0,1.0,3.0
13,916,3.0,-0.5,87.0,15.0,1.0,3.0
15,869,2.0,-1.5,93.0,11.0,1.0,3.0


<h2 align=right style="line-height:200%;color:#0099cc">
<font color="#0099cc">
Part Three
</font>
</h2>


<p dir=ltr style="direction: ltr;text-align: justify;line-height:200%;font-size:medium">
<font size=3>

For the remaining calculations, we only need the columns related to temperature. Therefore, for the DataFrame `windy_days_df`, keep the columns `humidity`, `t_feels_like`, `t_real`, and `wind_speed`, and ignore the rest of the columns.

</font>
</p>

In [None]:
windy_days_df = windy_days_df[['t_real','t_feels_like', 'humidity', 'wind_speed']]

windy_days_df.head()

Unnamed: 0,t_real,t_feels_like,humidity,wind_speed
10,3.0,-0.5,93.0,12.0
11,2.0,-1.5,100.0,12.0
12,2.0,-1.5,96.5,13.0
13,3.0,-0.5,87.0,15.0
15,2.0,-1.5,93.0,11.0


<p dir=ltr style="direction: ltr;text-align: justify;line-height:200%;font-size:medium">
<font size=3>
In the next steps, we want to change some values in the DataFrame. To ensure that the original DataFrame and `windy_days_df` do not change, it is necessary to use the `copy` function.

In general, using the equal sign for DataFrames in pandas works similar to NumPy, and it adds only a new pointer to the same DataFrame. Therefore, modifying one of the DataFrames will affect the other.

Using the `copy` function allows us to obtain a copy of the original DataFrame, and changes to one will not affect the other.

</font>
</p>

In [None]:
temperature_df = windy_days_df.copy()

<h2 align=left style="line-height:200%;color:#0099cc">
<font color="#0099cc">
Part Four
</font>
</h2>



<p dir=ltr style="direction: ltr;text-align: justify;line-height:200%;font-size:medium">
<font  size=3>
To perform better analyses, we need to understand how warm a day is relative to the overall temperature performance. To do this, follow these steps:

1. Save the maximum value of the `t_real` column in the variable `t_max`.
2. Save the minimum value of the `t_real` column in the variable `t_min`.
3. Add a new column named `t_percent` to the DataFrame. In this column, use the following formula for normalization to find the relative temperature:
</font>
</p>

`((temp - min) / (max - min)) * 100`


In [None]:
t_max = np.max(temperature_df['t_real'])
t_min = np.min(temperature_df['t_real'])
temperature_df['t_percent'] = ((temperature_df['t_real'] - t_min) / (t_max - t_min)) * 100

temperature_df.head()

Unnamed: 0,t_real,t_feels_like,humidity,wind_speed,t_percent
10,3.0,-0.5,93.0,12.0,11.428571
11,2.0,-1.5,100.0,12.0,8.571429
12,2.0,-1.5,96.5,13.0,8.571429
13,3.0,-0.5,87.0,15.0,11.428571
15,2.0,-1.5,93.0,11.0,8.571429


<h2 align=left style="line-height:200%;color:#0099cc">
<font color="#0099cc">
Part Five
    </font>
</h2>

<p dir=ltr style="direction: ltr;text-align: justify;line-height:200%;font-size:medium">
<font size=3>

Now, we want to predict the perceived temperature based on the available information and replace the values in the t_feels_like column. Use the following formula for assigning values to this column:
<br>
</font>
</p>

`t_feels_like = t_real + (humidity * t_real)/1000 - (wind_speed)/10 -2`
<br>


In [None]:
temperature_df['t_feels_like'] = temperature_df['t_real'] + (temperature_df['humidity'] * temperature_df['t_real'])/1000 - (temperature_df['wind_speed'])/10 -2

temperature_df.head()

Unnamed: 0,t_real,t_feels_like,humidity,wind_speed,t_percent
10,3.0,0.079,93.0,12.0,11.428571
11,2.0,-1.0,100.0,12.0,8.571429
12,2.0,-1.107,96.5,13.0,8.571429
13,3.0,-0.239,87.0,15.0,11.428571
15,2.0,-0.914,93.0,11.0,8.571429


<p dir=ltr style="direction: ltr;text-align: justify;line-height:200%;font-size:medium">
<font size=3>
As you can see, the values in the `t_feels_like` column have changed, and it is no longer the same as what we had in the original DataFrame. Now, let's examine the functionality of the `copy` function. If you have followed all the steps correctly, the values in the `t_feels_like` column in the `windy_days_df` DataFrame should not have changed. To ensure this, we can compare the values of these two DataFrames using the `head` function.

</font>
</p>

In [None]:
windy_days_df.head()

Unnamed: 0,t_real,t_feels_like,humidity,wind_speed
10,3.0,-0.5,93.0,12.0
11,2.0,-1.5,100.0,12.0
12,2.0,-1.5,96.5,13.0
13,3.0,-0.5,87.0,15.0
15,2.0,-1.5,93.0,11.0


<h2 align=left style="line-height:200%;color:#0099cc">
<font  color="#0099cc">
Part Six
</font>
</h2>



<p dir=ltr style="direction: ltr;text-align: justify;line-height:200%;font-size:medium">
<font size=3>
Finally, we want to calculate the accuracy of this prediction. For this purpose, we use the following formula known as Mean Absolute Error or MAE.
<br>
</font>
</p>

`np.mean(np.abs(temperature_df['t_feels_like'] - windy_days_df['t_feels_like']))`
<br>
<p dir=ltr style="direction: ltr;text-align: right;line-height:200%;font-size:medium">
<font size=3>
Calculate this value and store it in the variable named `difference`.
</font>
</p>

In [None]:
diffrence = np.mean(np.abs(temperature_df['t_feels_like'] - windy_days_df['t_feels_like']))
print(diffrence)

2.028340505491597


<h2 align=left style="line-height:200%;color:#0099cc">
<font color="#0099cc">
Part Seven
</font>
</h2>


<p dir=ltr style="direction: ltr;text-align: justify;line-height:200%;font-size:medium">
<font size=3>
In this exercise, we worked with the `head(n)` function, but let's dive deeper into this function.
Generally, this function is not only used for displaying data but also applying it to a DataFrame returns a copy of the first n data, and it can be stored in a separate DataFrame.
In the DataFrame below, store the first 100 data of the DataFrame `temperature_df`.
</font>
</p>

In [None]:
final_df = temperature_df.head(100)
final_df

Unnamed: 0,t_real,t_feels_like,humidity,wind_speed,t_percent
10,3.0,0.0790,93.0,12.0,11.428571
11,2.0,-1.0000,100.0,12.0,8.571429
12,2.0,-1.1070,96.5,13.0,8.571429
13,3.0,-0.2390,87.0,15.0,11.428571
15,2.0,-0.9140,93.0,11.0,8.571429
...,...,...,...,...,...
136,13.5,9.1475,85.0,35.0,41.428571
137,15.0,10.6300,82.0,36.0,45.714286
138,15.5,10.5935,77.0,41.0,47.142857
139,15.0,9.6800,72.0,44.0,45.714286
