# **Solar Images and Convolutional Neural Networks**
---

## **Project Overview**

Solar physics heavily relies on analysis of solar image data. These images and the process of accessing them can be challenging, and getting them machine learning ready can be a huge barrier to using them in machine learning applications. If a pipeline from the solar databases to trainable data was created, it could be used with more agency in the solar physics space. For this project we will be standing up a solar data image pipeline, that will allow us to train a network on different wavelength data. We will accomplish this by breaking into two sub-teams, and tackling the pipeline and the network. Finally, when the pipeline is established, we will use a custom CNN to classify wavelengths and work to extend this result to composite images as well. _(As with the last project, we will also contribute to the other sub-team through discussion posts)_

 <p style="text-align: center;"> <img src= sun.jpg width=300 alt='[img: https://sdo.gsfc.nasa.gov/data/]'/>  <br />  Image source: https://sdo.gsfc.nasa.gov/data/</p>


### **Project Goals:** 
**Data:** Create a pipeline using python to convert solar data into machine learning data. This training data should initially focus on our machine learning task, but additional considerations about the data can be included for future projects.

**Machine Learning:** Use a convolutional neural network (CNN) to predict the wavelength of the image. Then given a composite of 3 wavelengths, can the network determine what the wavelengths are.

### **Project Objectives**
- Use Python and our lab to access solar data.
- Determine what data our machine learning pipeline should focus on.
- Collect and curate the necessary data _(training and testing, features and targets)_ to prove the concept of our machine learning task. 
- Train a CNN to predict wavelengths given an image. Use the trained weights to find the spectrum of a composite image.  
- Develop a legacy project scope outlining three future goals for other research teams.

### **TimeLine**

| Monday       | Tuesday     | Wednesday   | Thursday  | Friday |
| :------------- | :---------- | :----------- | :----------- | :----------- |
|  |     |   |  | April 9 <br /> Introduce the Project and <br />Sunpy Introduction| 
| April 12 <br /> Data Augmentation <br /> **Solar Image Post Due**   |  Office Hour| April 14 <br />  Bounding Box Data | Office Hour | April 16 <br /> _Pipeline Team Presents_   | 
| April 19 <br /> The Composite Problem  <br /> **Network Architecture Post Due**  | Office Hour | April 21 <br />Explore Convolution Layers | Office Hour | April 23 <br /> _Network Team Presents_ <br /> Train the network (over the weekend) | 
| April 26 <br /> Project Reflection <br /> **Reflection Post Due**   

### **Deliverables**
- Each team member must post on the **Solar Image Post**  and the **Network Architecture Post** . These are due Monday night by 11:59 pm on D2L (April 12 (**Solar**)) and (April 19 (**Network**)). Below you will find more information on the posts (this info is also on D2L in the discussions). (20 points)
- Each team member must present with their team on the day that their team presents. **( Pipeline team (April 16) and Network team (April 23) )** (10 points)
- Each team member must write up a one-page, project reflection and post it in the discussion forum **(April 26)**. Below you will find more information on the reflection (10 points)

### **Sub-Teams:**

_Pipeline Team_ (Presentation date April 16th)
- Joe
- Christian
- Eddie
- Anton

_Network  Team_ (Presentation date April 23)
- Cassidy
- Stacie
- Joanna
- Jack

---
## **Sub-Teams Project Overview**

### **Pipeline Sub-team**
The focus of the pipeline team is to use Sunpy to collect the needed solar data, as well as curate a data set of training and testing data. This team will also identify the wavelengths that we will study, make any choices on data augmentation, and finally they will help to build the composite images that we will use in the final phase of testing. Pipeline will need to work with Network to connect the network to the training and testing data. They will also need to collaborate with the global team on how to predict the composite image. 

**Presentation target:**  What wavelengths are we training for and how many images have been collected for training and testing? Where is the data located and what are the sizes of the images? Where is the classification information? What will the first layer of the CNN need to be? **How was the composite image made?** What do some example images look like, and are there any issues that we should be aware of? What data augmentation was used? Is the data set-up for any additional tasks? 

### **Network Sub-team**
The network team will help to build and deploy a CNN that can be used to classify the wavelengths identified by the Pipeline team. This team will use known architectures to develop the CNN, and will help attach the data pipeline to the network for training. Once the network is trained, Network will work in collaboration with Pipeline to create a novel, multi-channel image that the trained network can try to determine the spectrum of. 

**Presentation target:** What is the structure of the network and why? What will the output of the network be? **How will be predict the composite image?** How long will it take to train the network? How will the weights be saved, and what size should we expect the entire model to be? When loaded into the GPU's, how much ram is needed? 

---
## **Discussion Posts Overview**

### **Solar Image Post**

For this discussion post include one image of the sun and the code used to download the image. Make sure that the image is of a wavelength that has not yet been posted. You don't need to include all the code here, just the fido search line with the attributes that you used. In addition to an image of the entire disk, post a viewed in image of a piece of the sun as well as the code used to isolate that region. _(Should be two images, and two blocks of code)_

### **Network Architecture Post Due**

For this discussion, post the keras summary for an architecture for a CNN that could be used for the general network. Also include information on where the structure for the network came from (source paper or reference). In addition to the summary, include information for the optimizer, batch size and the epochs, as well as any notes that you would like the network team (or the global team) to note. 

---
## **Reflection Overview**

For this reflection we are looking backwards as well as forward. We want to use what we have learned through the scope of this project to build a runway for additional exploration in the solar image analysis space.

Questions:
- What challenges were there working with this data? What are some techniques used to help overcome these challenges?
- Was the network successful with the classification task? If yes, could it be better? If no, can it be salvaged?
- What was the conclusion of predicting on composite data? 
- What was one new topic, idea or tool that you feel better equip to utilize? What was one new topic, idea or tool that you would like to explore more?
- Propose three additional questions that we could investigate using the solar data. Would we need more data?
- Any additional thoughts about the project or conclusion.