# Introduction to Coding and Data Analysis for Scientists


## Week 21: Questions & Reflection

## Programming project (assessment)

The final programming project (40%) is now available. This includes a code (20%) and report writing (20%) component.

**Due: Wednesday 26 March 2025 at 12 midday.**

- Details are available on the [Introductory Scientific Computing](https://www.ole.bris.ac.uk/webapps/blackboard/content/listContentEditable.jsp?content_id=_8257128_1&course_id=_254910_1&mode=reset) Blackboard course - ["Assessment, submission and feedback"](https://www.ole.bris.ac.uk/webapps/blackboard/content/listContentEditable.jsp?content_id=_8257140_1&course_id=_254910_1&mode=reset) course content area.


- See also the notes for Tutorial 7


- Submit using **Blackboard** on the "Assessment, submission and feedback" tab.

## Today's session (Week 21)

**Aim**: Reflect on your scientific computing and provide further detail on the final project


1. Reflect on what we have learnt this year

2. Review common questions

3. Discuss how to further develop your computing after the class

4. Open the floor for any questions people would like to discuss

## Project Questions 

### Report

Common questions:

- *How should the report be formatted?*

- *How long should the report be?*

### Report

*How should the report be formatted?*

> For the report, we’d like you to write this like a **investigative report** where a hypothesis has been tested based on the data available. This includes an **abstract, introduction, analysis and discussion, conclusions and any references** you have used. 

> **Note:** You can use Microsoft Word or LaTex for submission. **There is no penalty for using Microsoft word**.

> **Tip:** Start by thinking about what to put under each section heading so you have an idea of the structure you’re aiming for.

### Report

*Should we include code and plot/outputs?*

> You should not include any explicit code **within the report** 

> Code will be submitted in a notebook alongside the report (**this is also assessed!**). 

> In the report, you should describe the method and approach you took with any hypotheses you tested but can also add references to the code document as well if you like. 

> **Plots/outputs should be included in the report** when they illustrate results you want to show.

### Report

*How long should the report be?*

> This should be a substantial piece of work, so you should aim for your report to be **approximately 1000-2000 words or 3-6 sides of A4** (including plots). Though this is not a
hard limit, where possible you should aim to provide adequate detail but to also be concise and directed. Please stick to the **suggested structure** within the project outline.

In [None]:
#!pip install lorem
import lorem

# Generate 200 words of Lorem Ipsum text
lorem_text = lorem.text().split()[:200]  # Get exactly 200 words
lorem_text = " ".join(lorem_text)  # Convert back to string

print(lorem_text)

### Coding component

- *What output do we need to produce?*

- *How much detail do we need in our code document?*

### Coding component


* What output do we need to produce?

 > You must submit a jupyter notebook containing all of your code. 
 


 > In addition, if you feel it makes sense for your code, you can include `.py` modules containing functions you have written which were too large to include in the notebook. 
 


 > There will be marks for code organisation, so think carefully about how you structure your work!

### Coding component


* How much detail do we need in our code document?

> The coding component should be **self-contained** and include enough detail to be understood without referencing the report. This means you must highlight the steps taken, comment the code and explain any code used to create plots.

### Overall



- For each question, think about what approach you want to take and what hypothesis you are testing. Make sure you're addressing the main question.

- If you're unsure how to approach a problem, break it down into smaller steps and build it up. 

- Focus on your key results. Although all questions should be addressed, don't feel you have to equally present everything.

 - Ensure you understand what your model is producing. It is advisable to start with a simple, understandable settings/analyses before attempting the full solution.

 - For Option 2, make sure to look over the data e.g. look at the file itself, check some overall statistics and make plots, even if they don't get included within your final submission.

## Your questions

Please use this opportunity to ask any additional questions you have for the remainder of the session. 


 - These can be about the project (within reason)

 - Or about the material covered so far.

 - Or anything else coding related.

<img src="images/mentimeter_qr_code.png" width="500">

## This year

With what you've already learnt you can use Python to:

 - Perform numerical computation

 - Execute logical operations using loops and control flow statements

- Read and interpret data from files

- Make different types of plots

## Material for today



1. Requested topics from previous years:
  

<ul style="margin-left: 2em; list-style-type: circle;">
    <li><a href="../../Course_SCIF10002_2024/Week21/Week21_topics.ipynb">Week21_topics</a></li>
</ul>

<ul style="margin-left: 2em; list-style-type: circle;">
    <li><a href="../../Course_SCIF10002_2024/Week21/Week21_topic_examples.ipynb">Week21_topic_examples</a></li>
</ul>

2. Requested material from last week:

 <ul style="margin-left: 2em; list-style-type: circle;">
    <li><a href="Supplementary_Datatypes_Loops_and_Logic.ipynb">Supplementary_Datatypes_Loops_and_Logic</a></li>
</ul>

 <ul style="margin-left: 2em; list-style-type: circle;">
    <li><a href="Supplementary_Pandas.ipynb">Supplementary_Pandas</a></li>
</ul>

## Where can you take your scientific computing?

### Download Python+VSCode

Visual Studio Code is an open-source code editor developed by Microsoft. It allows you to  create and manipulate notebooks conveniently on any platform.

Go to: https://code.visualstudio.com/


To install a full Python environment, follow this guide:

https://www.raillyhugo.com/blog/how-to-setup-python-environment

For Mac OS X users, a custom Python installation via `pyenv` is highly reccomended:

- Pyenv: https://github.com/pyenv/pyenv


Some minimal knowledge of the Terminal is required to install custom Python versions and packages via `pip`. However, you will learn a lot following this route and gain a better understanding of how your computer works.


### Download Anaconda

<img src="images/anaconda-logo.png" alt="Anaconda logo" style="display:block;margin-left:auto;margin-right:auto;width:25%"/>


An alternative is provided by  the Anaconda (Individual Edition) on your own computer.

This is free to download and use and has Jupyter notebooks (and JupyterLab) built in.

Go to: https://www.anaconda.com/download

 - Download the appropriate version for your operating system (may have been pre-detected).

### Analysing data


<img src="images/pandas.svg" alt="Pandas logo" style="display:block;margin-left:auto;margin-right:auto;width:15%"/>

We have covered the fundamentals for using the `pandas` module in this course and have started to look at some more complex topics. If you've found this to be useful, you can continue learning more about pandas for data analysis and the other methods it provides.

Start off with:

#### 10 minutes to pandas


This gives an overview of some key pandas concepts, including some topics we have covered and some we haven't. Work through this for an overview:



- https://pandas.pydata.org/docs/user_guide/10min.html

#### More concepts in pandas

- [**Split-apply-combine** methods](https://pandas.pydata.org/docs/user_guide/groupby.html) (including grouping, [resampling](https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-resampling) and [window](https://pandas.pydata.org/docs/user_guide/window.html) methods)

- **Binning data** (e.g. [cut function](https://pandas.pydata.org/docs/reference/api/pandas.cut.html))


- **Using functions** across whole rows/columns (including [apply method](https://pandas.pydata.org/docs/user_guide/basics.html#row-or-column-wise-function-application), [string methods](https://pandas.pydata.org/docs/user_guide/text.html#text-string-methods), [datetime methods](https://pandas.pydata.org/docs/user_guide/basics.html#dt-accessor))

- **Styling** your data table look how you want (e.g. [Styling your DataFrame](https://pandas.pydata.org/docs/user_guide/style.html))

### Visualisation


#### Pandas.plotting submodule



<img src="images/scatter_matrix_kde.png" alt="Pandas plotting - scatter matrix" style="display:block;margin-left:auto;margin-right:auto;width:30%"/>



In addition to plotting tools we have used in pandas so far there is also a [pandas.plotting submodule](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#plotting-tools) with more plotting tools like this *scatter matrix*, looking at correlations between the data properties by combining scatter plots and kernel density plots.

#### Matplotlib and seaborn



For more plotting techniques and options in Python see these tutorials for matplotlib and the seaborn package:

 - Matplotlib tutorials - https://matplotlib.org/stable/tutorials/index.html

 - Seaborn User Guide - https://seaborn.pydata.org/tutorial/function_overview.html ([documentation](https://seaborn.pydata.org/))
    

### Parametrising data



<img src="images/scikit-learn-logo.png" alt="Scikit-learn logo" style="display:block;margin-left:auto;margin-right:auto;width:15%"/>



The [scikit-learn package](https://scikit-learn.org/stable/) provides lots of methods for applying different parametrisations to data including simple linear regression and more complex fits.


The scipy module also provides a lot of tools for both fitting data and performing statistical tests. Including:


 - curve_fit - https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html



 - stats module - https://docs.scipy.org/doc/scipy/reference/stats.html (e.g. https://docs.scipy.org/doc/scipy/tutorial/stats.html#analysing-one-sample)

### Creating algorithms and models


#### Numpy and Scipy



We have seen how the numpy module can be used to create efficient algorithms and processes. Have a a look at the scipy lectures series for some good examples of array manipulation and broadcasting for effective ways you could use these tools:


 - https://scipy-lectures.org/intro/numpy/operations.html

## Next week

 - Next week, there will be an optional drop-in session 

 - Same room, same time

 - Please feel free to bring any material you would like to this

 - You can use this time to work on the project

 - Or any other material you like

 - Final opportunity to ask questions!

# Thanks all 

## Good luck in your exams