# Solar Production Data Analysis

## Objective
This notebook will analysis solar production data using the data science lifecyce outlined here: link
https://docs.google.com/document/d/1gKr6j7s02hjilNhqYrjDK7t6jqR092hzezn-MB09EZo/edit#heading=h.mbjsiz6n6jlo

## Contents:
1. Business Understanding
2. Data Understanding
3. Data Preparation
4. Modeling
5. Evaluation
6. Deployment

python version: 3.9.0

requirements:
see imports

## 1 Business Understanding
### Sections
#### 1.1 Question Definition
First, define your question(s) of interest from a business perspective. 
#### 1.2 Success Criteria
What is the goal of the project you propose, and what are the criteria for a successful or useful outcome? Assess the situation, including resources, constraints, assumptions, requirements. 
#### 1.3 Statistical Criteria 
Then, translate business objective(s) and metric(s) for success to data mining goals. If the business goal is to increase sales, the data mining goal might be to predict variance in sales based on advertising money spent, production costs, etc. 
#### 1.4 Project Plan
Finally, produce a project plan specifying the steps to be taken throughout the rest of the project, including initial assessment of tools and techniques, durations, dependencies, etc.

What is the goal of this project?
What are the criteria for a successful outcome?

Business goals are:
What will the production be of a new system?
What will the performance be of a new system?

Other questions to answer if time permits:
What is the quality of the data?
understand the fleet better - how many systems, when were they installed
indentify and predict poor performance?
how might we improve performance?
Can we predict future production?



## 1.1 Question Definition
First, define your question(s) of interest from a business perspective.

Solar systems convert light energy from the sun to electricity. The amount of electricity produced is called "production" and the amount of electricity produced in reality compared to expectation is called "performance".
A commercial and residential solar installer is interested to learn if they can predict the production and performance of a new solar system. They have provided data on the production and performance of their fleet with the hopes that it can be used as training data for a model. Among other things, the model and analysis should tell them basic information, like growth of their fleet, average performance, and top performing systems, but also advanced statistics, like if the financial company is correlated to performance, which months are the best to install a new system, and what is the best estimation for annual degradation.

## 1.2 Success Criteria
What is the goal of the project you propose, and what are the criteria for a successful or useful outcome? Assess the situation, including resources, constraints, assumptions, requirements. 

#### 1.2.1 Goal
#### 1.2.2 Criteria for success
#### 1.2.3 Resources
#### 1.2.4 Constraints
#### 1.2.5 Assumptions
#### 1.2.6 Requirements

## 1.3 Statistical Criteria 
Then, translate business objective(s) and metric(s) for success to data mining goals. If the business goal is to increase sales, the data mining goal might be to predict variance in sales based on advertising money spent, production costs, etc.

## Load Data

In [None]:
df = pd.import_data()

## 2 Data Understanding
### 2.1 List all datasets required
Collect initial data and list all datasets acquired, locations, methods of acquisition, and any problems encountered. 
### 2.2 Gross properties of the data
 Describe the gross properties of the data, including its format, shape, field identities, etc.
### 2.3 Feature Analysis
Explore key attributes, simple statistics, visualizations. Identify potential relationships and interesting data characteristics to inform initial hypotheses. 
#### 2.3.1 Portfolio
#### 2.3.2 Holding Co
#### 2.3.3 Project Co
#### 2.3.4 Contract id
#### 2.3.5 Date
#### 2.3.6 Production
### 2.4 Summary on Data Quality
In this section, I'll examine the quality of the data, e.g. completeness, consistency, formatting, and report any potential problems and solutions.



# 3 Data Preparation
## 3.1 Feature Selection
Determine which data will be used (selection of attributes/columns and observations/rows) and document reasons for inclusion or exclusion. 
## 3.2 Data Cleaning
Clean the data and describe actions taken. Techniques could include selection of subsets for examination, insertion of defaults or estimations using modelling for missing values. Note outliers/anomalies and potential impacts of these transformations on analysis results. 
## 3.3 Feature Engineering
It may be useful to derive new attributes from the data as combinations of existing attributes, and describe their creation. It may also be useful to merge or aggregate datasets, in which case you should be careful of duplicate values. 
## 3.4 Data Reformating
Finally, re-format the data as necessary (e.g. shuffling the order of inputs to a neural network or making syntactic changes like trimming field lengths).

# 4 Modelling
## 4.1 Model Selection
Select and document any modeling techniques to be used (regression, decision tree, neural network) along with assumptions made (uniform distribution, data type). 
## 4.2 Test Design
Before building the model, generate test design - will you need to split your data (e.g. into training, test, and validation sets), and if so, how? 
## 4.3 Parameter tuning
Next, run the selected modeling tool(s) on your data, list parameters used with justifications, and describe and interpret resulting models. 
## 4.4 Model Evaluation
Generally, you want to run different models with different parameters, then compare the results across the evaluation criteria established in earlier steps. Assess the quality or accuracy of each model, revise and tune iteratively.


# 5 Evaluation
## 5.1 Results
Summarize the results of the previous step in business terms - how well were your business objective(s) met? Models that meet business success criteria become approved models. 
## 5.2 Summary
After selecting appropriate models, review the work accomplished. Make sure models were correctly built, no important factors or tasks were overlooked, and all data are accessible/replicable for future analyses. 
## 5.3 Next Steps
Depending on assessed results, decide next steps, whether to move on to deployment, initiate further iterations, or move on to new data mining projects.

# 6 Deployment
## 6.1 Deployment Plan
Develop a plan for deploying relevant model(s). 
## 6.2 Monitoring Plan
Further develop a monitoring and maintenance plan to ensure correct usage of data mining results and avoid issues during the operational phase of model(s). 
## 6.3 Conduct Retrospective
Summarize results, conclusions and deliverables. Conduct any necessary retrospective analysis of what went well, what could be improved, and general experience documentation.