# Useful basics of Python for scientists

### The following is a binder for some basic useful operations in Python

The point of this notebook/binder is to introduce some key concepts in working with code for science. It is structured similarly to how an actual data analysis notebook might be set up. ** Super important ** : Make sure you have opened (by clicking) the folder in the left bar so you can see any folder and files you create. Can you see where the README.md for the repo is? Then continue...


<a href="#Lesson-One">Lesson One</a>
Making and deleting folders




<a href="#Lesson-Two">Lesson Two</a>
Getting some basic statistics



<a href="#Lesson-Three">Lesson Three</a>
Graphing stuff

In [None]:
# import libraries
import os
import shutil
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Lesson One

In [None]:
# here we will show an alternative way to make folders (not bash but python)
# note, this will drop it into the repo where your notebook is
os.mkdir('to_be_erased')

Let's break that down: We just called the Python os library. OS stands for operating system, we will use it to handle stuff on our operating system. os.mkdir() is a method that create a new directory at a specified path (you put in the path as a parameter (the things in parenthesis)). If the directory already exists, a FileExistsError is raised.

In [None]:
# here we will put in the whole tree in a for-loop
for i in range(0, 6):
    end = (str("directory_" + str(i)))
    os.mkdir(os.path.join('to_be_erased', end))

Let's break that down:
The first code line ( for i in range(0, 6):) gives a different number to the letter i, starting at 0 and going up to 5, and starts a for loop. In the for loop we will do the next steps after it for each of the numbers 0,1,2,3,4,5. We use the number in i from the loop and adds it to the word "directory_" making a string (a specific kind of variable). In the final line we take this string and create it as a folder.

In [None]:
# here we will take down the whole folder tree
shutil.rmtree('to_be_erased')

Let's break that down: Shutil is another library for file operations. Could this have been done with os? Yes, but shutil is more convenient. This shutil method let's us take down a whole folder tree with stuff inside in one fell swoop. 

# Lesson Two 
Some basic statistics

Here I will show how descriptive statsistics can be easy as one liners of code

Below we will make a fake dataset


The following cell contains entirely made up data in no way related to any of the actual patients or participants

In [None]:
# made up sample data for 5 subjects
data = {
    'participant_id': [101, 102, 103, 104, 105, 106, 107],
    'tibialis_anterior_volume_left': [1500, 1600, 1580, 1550, 1620, 1123, 1123],
    'tibialis_anterior_volume_right': [1520, 1610, 1575, 1560, 1600, 1333, 1333],
    'gluteus_max_volume_left': [7500, 7700, 7600, 7650, 7800, 1222, 7888],
    'gluteus_max_volume_right': [7550, 7680, 7580, 7700, 7850, 7890,5678],
    'soleus_volume_left': [3000, 3100, 3050, 3080, 3150, 5567,3213],
    'soleus_volume_right': [3020, 3090, 3060, 3070, 3140, 2343, 3454],
}

# make a ataFrame
df = pd.DataFrame(data)

# display top 2 lines of dataFrame
df.head(2)


Want to break that down?: go to the documentation for pandas [https://pandas.pydata.org/docs/](https://pandas.pydata.org/docs/) and search the DataFrame method. 

In [None]:
df.describe()

In [None]:
df.corr(method='spearman')

But what if I didn't know the methods? ANd am to lazy to read the whole documentation?

In [None]:
df.corr?

## Lesson Three

graphing

Here we will do a quick and basic plot

In [None]:
plt.scatter(df.tibialis_anterior_volume_left, df.tibialis_anterior_volume_right)

Now we can get a bit fancy in several ways...what if we want to emphasize the relationship line, which in this case looks linear i.e. first degree

In [None]:

x = np.array(df.tibialis_anterior_volume_left)
y = np.array(df.tibialis_anterior_volume_right)

# fit a 1st degree polynomial (i.e., y = mx + b)
slope, intercept = np.polyfit(x, y, 1)

# generate predicted y values
y_fit = slope * x + intercept

# Plot data and best-fit line
plt.scatter(x, y, color='blue', label='Data points')
plt.plot(x, y_fit, color='red', label=f'Best-fit line: y = {slope:.2f}x + {intercept:.2f}')

# Labels and grid
plt.xlabel('left tibialis')
plt.ylabel('right tibilialis')
plt.title('Tibialis L v. R with Linear Fit')
plt.legend()
plt.grid(True)
plt.show()


Or what if we want to look at /emphasize individual particpants?

In [None]:
x = df['tibialis_anterior_volume_right']
y = df['tibialis_anterior_volume_left']
# Labels and fancy
labels = df['participant_id']
cmap = plt.get_cmap('tab20') 
for i in range(len(x)):
    plt.scatter(x[i], y[i], color=cmap(i), label=labels[i])
plt.legend(title='Labels', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xlabel('tibialis right')
plt.ylabel('tibilias left')

You can do decent graphing even in matplotlib. For advanced graphing consider other libraries in addition i.e. seaborn, plotly etc. 