# **Working with CSV Data and the Pandas Library**

**Table of Contents:**
1. `CSV Library`
2. `Pandas Library`
   * `Pandas Data Type` 
     * `DataFrame (2-Dimensional)`
     * `Series (1-Dimensional)`
3. `Convert DataFrame Type to Other Data Type`
4. `Convert Series Type to Other Data Type`
5. `Get data in column (Sensitive Case)`
6. `Get data in row`
7. `Get data from the row data`
8.  `Create a DataFrame from scratch`
9.  `Convert DataFrame from scratch to a CSV File`

### **Challenge-1**
Open up the `weather_data.csv` file inside `main.py` and add each line of data into a list wich we'll call data.

Open the `weather_data.csv`. Use `.readlines()` to create a list named data that contains the values from the `.csv` file.

In [15]:
with open("weather_data.csv") as data_file:
    data = data_file.readlines()
    print(data)

['day,temp,condition\n', 'Monday,12,Sunny\n', 'Tuesday,14,Rain\n', 'Wednesday,15,Rain\n', 'Thursday,14,Cloudy\n', 'Friday,21,Sunny\n', 'Saturday,22,Sunny\n', 'Sunday,24,Sunny']


### **CSV Library**
**Alternative Solution Above Code (From Course)**

In [14]:
import csv

with open("weather_data.csv") as data_file:
    data = csv.reader(data_file) # Return Object
    print(data)
    for row in data: # The object can be looped through
        print(row)

<_csv.reader object at 0x0000019117262740>
['day', 'temp', 'condition']
['Monday', '12', 'Sunny']
['Tuesday', '14', 'Rain']
['Wednesday', '15', 'Rain']
['Thursday', '14', 'Cloudy']
['Friday', '21', 'Sunny']
['Saturday', '22', 'Sunny']
['Sunday', '24', 'Sunny']


### **Challenge-2**
Add the temperature from each row in interger format

In [29]:
import csv

with open("weather_data.csv") as data_file:
    data = csv.reader(data_file)
    temperatures = []
    for row in data:
        if row[1] != "temp":
            temperatures.append(int(row[1]))
    print(temperatures)

[12, 14, 15, 14, 21, 22, 24]


**Alternative Solution**

In [36]:
import csv

with open("weather_data.csv") as data_file:
    data = csv.reader(data_file)
    temperatures = []
    for row in data:
        temperatures.append(row[1])
    temperatures.pop(0)

    final_temperatures = []
    for temp in temperatures:
        final_temperatures.append(int(temp))
    print(final_temperatures)

[12, 14, 15, 14, 21, 22, 24]


### **Pandas Library**
Another Way to Read Files

In [2]:
import pandas

df_data = pandas.read_csv("weather_data.csv") # DataFrame Type
print(type(df_data))

<class 'pandas.core.frame.DataFrame'>


In [48]:
print(df_data)

         day  temp condition
0     Monday    12     Sunny
1    Tuesday    14      Rain
2  Wednesday    15      Rain
3   Thursday    14    Cloudy
4     Friday    21     Sunny
5   Saturday    22     Sunny
6     Sunday    24     Sunny


In [49]:
s_data = df_data["temp"] # Series Type
print(type(s_data))

<class 'pandas.core.series.Series'>


In [50]:
print(s_data)

0    12
1    14
2    15
3    14
4    21
5    22
6    24
Name: temp, dtype: int64


### **Convert DataFrame Type To Dictionary**

In [51]:
data_dict = df_data.to_dict()
print(data_dict)

{'day': {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Thursday', 4: 'Friday', 5: 'Saturday', 6: 'Sunday'}, 'temp': {0: 12, 1: 14, 2: 15, 3: 14, 4: 21, 5: 22, 6: 24}, 'condition': {0: 'Sunny', 1: 'Rain', 2: 'Rain', 3: 'Cloudy', 4: 'Sunny', 5: 'Sunny', 6: 'Sunny'}}


### **Convert Series Type To List**

In [53]:
data_list = s_data.to_list()
print(data_list)

[12, 14, 15, 14, 21, 22, 24]


### **Challenge-3**
Calculate the average of temperature in temperature column

In [54]:
data_avg_temp = df_data["temp"].mean()
print(data_avg_temp)

17.428571428571427


**Alternative Solution**

In [55]:
data_avg_temp = sum(data_list) / len(data_list)
print(data_avg_temp)

17.428571428571427


### **Challenge-4**
Get the maximum value of temperature by using one of the data series method

In [56]:
data_max_temp = df_data["temp"].max()
print(data_max_temp)

24


### **Get Data In Column (Sensitive Case (Capital & Non-Capital))**

In [58]:
print(df_data["condition"])

0     Sunny
1      Rain
2      Rain
3    Cloudy
4     Sunny
5     Sunny
6     Sunny
Name: condition, dtype: object


In [59]:
print(df_data.condition)

0     Sunny
1      Rain
2      Rain
3    Cloudy
4     Sunny
5     Sunny
6     Sunny
Name: condition, dtype: object


### **Get Data In Row**

In [60]:
print(df_data[df_data.day == "Monday"])

      day  temp condition
0  Monday    12     Sunny


**Get all the Sunny condition from the condition column in row**

In [3]:
print(df_data[df_data.condition == "Sunny"])

        day  temp condition
0    Monday    12     Sunny
4    Friday    21     Sunny
5  Saturday    22     Sunny
6    Sunday    24     Sunny


**Get the highest temperature data from the temperature column in row form**

In [61]:
print(df_data[df_data.temp == df_data["temp"].max()])

      day  temp condition
6  Sunday    24     Sunny


### **Get Data From The Row Data**

In [4]:
monday = df_data[df_data.day == "Monday"]
print(monday)

      day  temp condition
0  Monday    12     Sunny


**Get the condition column data from the row data**

In [5]:
print(monday.condition)

0    Sunny
Name: condition, dtype: object


In [8]:
sunny = df_data[df_data["condition"] == "Sunny"]
print(sunny)

        day  temp condition
0    Monday    12     Sunny
4    Friday    21     Sunny
5  Saturday    22     Sunny
6    Sunday    24     Sunny


In [11]:
print(sunny.temp[0])

12


In [12]:
print(sunny.temp[1]) # Error because the 1's index from the sunny is not available (0, 4, 5, 6)

KeyError: 1

In [14]:
print(sunny.temp[4])

21


### **Challenge-5**
Convert Monday's temperature to Fahrenheit.

Hint: use `[]` to get a single value from the pandas Series by index.

In [66]:
monday_temp = monday.temp[0] # Output: 12
monday_temp_F = monday_temp * 9/5 + 32 # Output 53.6
print(monday_temp_F)

53.6


**Alternative Solution (Not Really True)**

In [71]:
def f(c):
    c = c * 1.8 + 32
    return c

monday_temp = monday["temp"].apply(f)
print(monday_temp)

0    53.6
Name: temp, dtype: float64


### **Create a DataFrame From Scratch**

In [72]:
import pandas
data_dict = {
    "students": ["Amy", "James", "Angela"],
    "scores": [76, 56, 65],
}

data = pandas.DataFrame(data_dict)
print(data)

  students  scores
0      Amy      76
1    James      56
2   Angela      65


### **Create a CSV File From The DataFrame**

In [73]:
data.to_csv("new_data.csv")

### **Challenge-6**
Create a DataFrame from the Primary Fur Color, Calculate each color, then create a CSV File.

In [113]:
import pandas as pd

df_data = pd.read_csv("2018_Central.csv")

gray_squirrel_count = len(df_data[df_data["Primary Fur Color"] == "Gray"])
cinnamon_squirrel_count = len(df_data[df_data["Primary Fur Color"] == "Cinnamon"])
black_squirrel_count = len(df_data[df_data["Primary Fur Color"] == "Black"])

fur_color_dict = {
    "Fur Color": ["Gray", "Cinnamon", "Black"],
    "Count": [gray_squirrel_count, cinnamon_squirrel_count, black_squirrel_count]
}

df = pd.DataFrame(fur_color_dict)
df.to_csv("squirrel_count.csv")

**Alternative Solution**

In [116]:
s_data = df_data["Primary Fur Color"]
s_data = s_data.astype("category")
# print(s_data)

squirrel_fur_color = df_data.groupby("Primary Fur Color").size().reset_index(name="counts")
# print(squirrel_fur_color)

squirrel_fur_color.to_csv("squirrel_fur_color.csv")