# Recitation 14

___

**Important**: Keep in mind that if you create several versions of a class, only the **last defined class** will be graded by the autograder. Also, when you make modifications to a class, **any previously created objects will need to be recreated**.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### String Slicing in Pandas
To slice a string in pandas, use the `.str[start:stop]` syntax. For example, if `word` refers to a DataFrame entry, then `word.str[1:3]` returns the 2nd and 3rd characters in `word`.

### Leading Zeros in f-Strings
Suppose you wish to convert `num=95` to string format using 4 characters, then `f'{num:4}'` will return `'  95'` with leading blanks. To produce leading zeros instead of blanks, type `f'{num:04}'`.

___

### DailyTemp Class
**Create a `class` called `DailyTemp`** which will store the minimum and maximum daily temperatures for each day in a single year for a single weather station. It will contain the following attributes:

* `df`: a DataFrame containing the temperature data with column names `STATION_NAME`, `DATE`, `TMIN`, `TMAX`, `TAVG`
* `station`: name of the weather station
* `year`: year in `int` format

The `__init__()` method will do the following:
* **Take a csv filename as input.** The file will contain max/min temperature data with column headers named `STATION_NAME`, `DATE`, `TMIN`, `TMAX`.
* **Read in the file** and **convert into a DataFrame** `df`.
* **Drop all rows** containing missing values by calling `df.dropna(inplace=True)`.
* Each DATE is a number in YYYYMMDD format. **Convert the `DATE` column to `str` type.**
* **Add a column called `TAVG`** which equals the average of the TMIN and TMAX temperatures for each day.

**Examples**
```
temp19 = DailyTemp('boulder-temps-2019.csv')
temp19.df.iloc[350]
```
displays
```
STATION_NAME    BOULDER CO US
DATE                 20191218
TMIN                     19.0
TMAX                       54
TAVG                     36.5
Name: 351, dtype: object
```
and
```
temp19.station, temp19.year
```
displays
```
('BOULDER CO US', 2019)
```

In [7]:
class DailyTemp:
    def __init__(self,filename):
        self.df = pd.read_csv(filename)
        self.df.dropna(inplace=True)
        self.df["DATE"]=self.df["DATE"].astype("str")
        self.df["TAVG"]= (self.df["TMIN"]+self.df["TMAX"])/2
        self.station=self.df.STATION_NAME[0]
        self.year = int(self.df["DATE"][0][:4])

temp19 = DailyTemp('boulder-temps-2019.csv')
temp19.df.head()          

Unnamed: 0,STATION_NAME,DATE,TMIN,TMAX,TAVG
0,BOULDER CO US,20190101,2.0,15,8.5
1,BOULDER CO US,20190102,1.0,41,21.0
2,BOULDER CO US,20190103,22.0,56,39.0
3,BOULDER CO US,20190104,17.0,61,39.0
4,BOULDER CO US,20190105,32.0,60,46.0


**Add the following methods:**
* `year_stats()`: returns the `TAVG` mean, standard deviation (std), max, and min values for the year as a Series. Use `describe()`.
* `month_stats(month)`: similar to `year_stats()` except the stats for a given month are returned.
* `hottest_day()`: returns the date corresponding to the max temperature of the year. The format of the result should be 'Mmm DD, YYYY', with no leading zero for DD. If there is more than one date, return the first one in the DataFrame.
* `coldest_day()`: similar to `hottest_day()` except for the min temperature of the year.

**Examples**

```
temp19.year_stats()
```
displays
```
mean    50.350275
std     17.303091
max     80.000000
min      5.500000
Name: TAVG, dtype: float64
```
and
```
temp19.coldest_day()
```
returns
```
'Feb 7, 2019'
```

In [6]:
# these month abbreviations are provided
month_abbr = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
numeric = ["01","02",'03','04','05','06','07','08','09','10','11','12']
month_index_dict = dict(zip(month_abbr,numeric))
class DailyTemp(DailyTemp):
    def year_stats(self):
        return pd.Series(self.df.describe()["TAVG"][["mean","std","max","min"]])
    def month_stats(self,month):
        num = month_index_dict[month]
        month_df = self.df[self.df["DATE"][4:6]==num]
        return pd.Series(month_df.describe()["TAVG"][["mean","std","max","min"]])
        #return pd.Series(self.df[self.df["DATE"][5:7]==month_index_dict[month]])
    
temp19 = DailyTemp('boulder-temps-2019.csv')
temp19.month_stats("Jan")      

  month_df = self.df[self.df["DATE"][4:6]==num]


IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

___

# Extra Problems
Work on these problems after completing the previous exercises.

**Add the following methods:**
* `plot(month)`: plots the daily temperatures for a given month. If the month is not provided, the annual data is used.
* `bar(month)`: displays a bar chart showing the TAVG for each day of the given month. If the month is not provided, a bar chart showing the average TAVG for each month of the year is displayed.

**Examples**  
The results for `.plot()`, `.plot(1)`, `.bar()`, `.bar(1)` are shown below.

<img src="http://www.coloradomath.org/python/temp_plot_yr.jpg" /> <img src="http://www.coloradomath.org/python/temp_plot_jan.jpg" />

<img src="http://www.coloradomath.org/python/temp_bar_yr.jpg" /> <img src="http://www.coloradomath.org/python/temp_bar_jan.jpg" />

