# Outline of Lab:

### 1. Introduction: Transition from Sankey
 - "You've now *visualized* the water transmission between and analyzed acquired an intuition as to why you saw the change that you saw. Now you'll look at this data in barplots alongside with energy transmission data and analyze the ratio of energy consumption to the volume of transported water..."

### 2. Introduce data
- <div style = "color: orange">**NOTE:** I made a new csv so students don't have to worrya bout 2010 and 2015 data being separate, combining them together, and then filtering out uncessary water targets and years. The new csv is called `network.csv`</div>
- Take a moment to explain the data. What do the columns mean? Do we need any more columns? What does "source" and "target" mean? What to the "year" columns mean?
    - create the new "Total Energy" column here!
- *Note*: Intro notebook MUST cover `.where()` and `.group()` functions thoroughly.

### 3. Analysis Example!
- Conduct example analysis on Cerrito utility. Make sure to **explain** what you're doing and why: what will the students take away from this analysis? Ask discussion questions to make sure students are following, and **provide your own interpretation**.
- As a part of the analysis, plot a bargraph with 2010, predicted 2015 and actual 2015 data.
- *Note:* Make sure to interpret the graph at some point. Students will be looking to see an example of how you analyze the graph to do it themselves.
- Finally, compare and **discuss** the total energy to cumulative water volume ratio between 2010, predicted 2015

### 4. Students try their own utility!
- provide some skeleton code, have students produce the same bar graphs & look at energy to water ratio, write down their interpretation, answer discussion questions.

### 5. Big Picture: what did this look like overall across California?
- Use all utilities to conduct the same analysis
- discuss effect of drought, ratio
- Discussion questions, ask if students' utilities experiences the same trend.

### 6. Conclusion:
- Peer consulting (resource)
- Survey

---

# Notes for Mina (4/4/19)

1. Data:
    - Keiko to get rid of unnecessary columns
    - Explain that the **`year`** column only contains three values: 2010, 2015, and Predicted 2015
    - Explain what "Predicted 2015" is
    - Explain that the "target" column contains **only** water utilities, while the **source** columns contain any water source that provides water to those target utilities.
    * Explain what students will be doing with the data!

2. `.where()` function: briefly go over what it does
    - In the intro notebook, Keiko will provide examples of how to extract dates and utilities out from dataframe so students are prepared to do ths in your lab

3. `.sort()` function: don't need! Get rid of this

4. Cerritos Example: 
    - output dataframes for 2010, provide blub of what to notice
    - same for 2015
    - Quickly compare values you see in 2010 and in 2015

5. Students try their own:
    - provide same structure as above
    - provide relevant discussion questions
    
6. Move onto "Further Analysis".
    - Explain what you will do for further analysis (maybe finish making this part of the lab and come back to writing the description)
    - Explain why you are introducing the formulas: why do you need it / what will you do with it
    - Go right into visualization
    - analyze vizualization -- what should you notice?
    - Calculate relevant difference / percent difference measures
    - Ratio analysis -- make sure you explain why you're doing this thoroughly
    
7. Students try their own! Restructure order so it follows the same flow as the Cerritos example.

# Documentation: Making Class Data for Analysis

We had two data files provided: the 2010 and 2015 networks data. Here, we take you through the steps in how we constructed the class data.

**Idea:**
- We wanted the `year` column to contain the string values "2010", "2015", and "Predicted 2015"
- The `target` column to ONLY contain Utility Codes, not any other water source
- To simply the table, only contain the following columns: 
    - year, source, target, cumulative_volume_af, transmission_kwh/af


In [1]:
import pandas as pd
import numpy as np

Since both 2010 and 2015 data have the same structure, we can simply concatenate the two to make one large data frame which we call `data` below

In [2]:
data2010 = pd.read_csv('network2010.csv')
data2015 = pd.read_csv('network2015.csv')
data = data2010.append(data2015)

We want the target column to only contain Utiltiy Codes. Utility Codes are the only codes that always end with the letter `E` at the end of the 7 digit number -- we first make a list of all utility codes using this condition then filter the data to only contain those utility codes under the `target` column.

In [3]:
#Extracting all utility codes
utilities = [utility for utility in data2010['target'].unique() if "E" == utility[-1]]

# Filtering DataFrame
util_data = data[data['target'].isin(utilities)]

In [13]:
# Look at the table here as an example when reading the blurb on year columns below
util_data.iloc[1000:1005,:]

Unnamed: 0,year,data_year,source,target,cumulative_volume_af,transmission_kwh/af,treatment_kwh/af,used_vol_af
2707,2015,2010,1807413PD,1807413E,11616.72,163.1,0.0,11616.72
2708,2015,2010,1807415NPD,1807415E,45.0,195.72,0.0,45.0
2709,2015,2010,1807415PD,1807415E,8837.0,163.1,0.0,8837.0
2711,2015,2010,1807417PD,1807417E,26536.95,163.1,0.0,26536.95
2713,2015,2010,1807419NPD,1807419E,266.993604,195.72,0.0,266.993604


`data_year` indicates when the record was entered into the data, while `year` indicates the year corresponding to the values of rows. For example (above), `data_year` can be 2010 and `year` can be 2015 in the same row: this indicates that predictions for 2015 were made in 2010.

Note that since we only have the recorded data from 2010 and 2015, `data_year` only contains 2010 and 2015. However, for each of these years, there we many predictions made about the future, so `year` contains many different years. For the purposes of our lab, we were only interested in 2010 values (actual values), and 2015 values (either actual or predicted). 

To construct our desired table, we split up the data into two tables: table containing predicted 2015, and table containing actual 2010 and actual 2015. We then relabeled the values such that the new simplified `year` columns contain one of three values:
    - "2010"
    - "2015"
    - "Predicted 2015"
   

In [19]:
# Keeping only 2010 and 2015 data under year column

util_data = util_data[util_data['year'].isin([2010, 2015])]

In [63]:
# if data was entered in 2010 but the year reads 2015, then the values correspond to Predicted 2015
predicted = util_data[(util_data['data_year'] == 2010) & (util_data['year'] == 2015)]

# If data_year says 2015, corresponding records MUST be actual 2015 data (given filter in cell above)
# If year says 2010, corresponding records MUST be actual 2010 data

actual_data = util_data[(util_data['data_year'] == 2015) | (util_data['year'] == 2010)]

# Creating proper String Value Types for the year column
predicted['year'] = np.repeat("Predicted 2015", len(predicted))
actual_data['year'] = actual_data['year'].apply(str)

Unnamed: 0,year,source,target,cumulative_volume_af,total_energy_kwh
0,2010,1801001PD,1801001E,1624.0,264874.4
1,2010,1801007PD,1801007E,2015.2,328679.12
2,2010,1801009PD,1801009E,2924.55,476994.105
3,2010,1801011PD,1801011E,1350.0,220185.0
4,2010,1801015PD,1801015E,840.0,137004.0


In [61]:
# Appending the actual and predicted data into one large table
data = actual_data.append(predicted)

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)


In [None]:
# Keeping only necessary columns
data = data.loc[:, ['year', 'source', 'target', 'cumulative_volume_af', 'transmission_kwh/af']]

# Saving the final class data as csv titled "network.csv"
data.to_csv('network.csv', index = False)