# **PART 2: PYTHON FOR SCIENTISTS: PACKAGES/LIBRARIES**

Some of the most powerful tools you'll use in Python are packages and libraries. These contain useful functions - from quite simple to more advanced ones. We'll go through only a few basic examples today, but we'll leave you with a list of libraries and packages to check out: 

- numpy 
- matplotlib
- seaborn
- scipy 

**You'll need to import every package you want to use**. This is not a one-time thing - you have to do it in each new Python notebook. With time you'll have a pretty good sense of what packages you need (and Stackoverflow to the rescue, always!). 

In [None]:
# Try it yourself
import numpy

In [None]:
# Calculate the mean in a new way
array = [2, 4, 6, 8, 10]

averageOfArray = numpy.mean(array)
print(averageOfArray)

In [None]:
# Alternatively, you can use abbreviations for packages - this is really handy. 
import numpy as np

averageOfArray = np.mean(array)
print(averageOfArray)

In [None]:
# And, of course, you can print or use any of these directly, such as: 
print(np.mean(array))

In [None]:
# Numpy is also powerful in prepopulating your arrays
a = np.zeros((2))  
print(a) 

print('') # Space to separate outputs

# Even if you need multiple dimensions
b = np.ones((2, 2))
print(b)

print('') # Space to separate outputs

# You can also create an array filled with the same number of choice, something like 
x = 27.34 
c = np.full((5), x)
print(c)

### **2.1 DATA VISUALIZATION**

In [None]:
# Let's start by plotting something quite simple using a line plot in matplotlib (remember to import this in every notebook)
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 5, 6, 11, 14]

plt.plot(x, y, linestyle='--')

In [None]:
# Alternatively, styles can be spelled out as well (this is true for colors, etc.) 
plt.plot(x, y, linestyle='dashed', color = 'orange', linewidth = 8)


In [None]:
# It's likely you'll often want to look at a relationship between two variables. 
# You can do this simply by scattering them in matplotlib. 

x = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
y = [2, 5, 6, 11, 14, 5, 2, 7, 8, 9, 1, 2, 3, 7, 3]

plt.scatter(x, y, color = 'royalblue')


In [None]:
# Alternatively, more powerful visualizations are available in seaborn. 
# Here, we are plotting the exact same data as in the figure above, but we are also denoting 
# the line of best fit for this data. 

import pandas as pd
import seaborn as sns 

dataframe = pd.DataFrame({'x': x, 
                        'y': y})

g = sns.lmplot(x='x', y='y', data = dataframe)

In [None]:
# Want to know how correlated your variables are? 
# There are always multiple ways to do things in Python. Here is an example of 
# how even something as simple as a correlation between two variables can be calculated 
# using multiple packages. 

x = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
y = [2, 5, 6, 11, 14, 5, 2, 7, 8, 9, 1, 2, 3, 7, 3]

print(np.corrcoef(x, y))
print('Correlation using numpy: ', np.corrcoef(x, y)[0][1]) # What do you think [0][1] means?

from scipy.stats import linregress

print(linregress(x, y))
print('Correlation using scipy: ', linregress(x, y)[2])

#### **TASK 5**:

So far in this class we've avoided using external datasets, but big data are hard to visualize with simulated sets. So, for the next few examples we'll ask you to play with an existing dataset for Python called Iris, which contains some flower feature data.

This data is in a pandas dataframe, which means some of the functions that are most useful to work with it will be other pandas functions. Working with a dataset like this is a great example of a task that will require a bit of googling and pasting together bits of code other people have written or reading some documentation for functions you are interested in.

In [None]:
from sklearn import datasets
import seaborn as sns 

iris = sns.load_dataset("iris") # Load the available dataset 
iris.describe() # Useful method for taking a look at your data

In [None]:
# Let's have a look at the data. 
# For 150 flowers, we have their sepal length, sepal width, petal length, petal width and species. Neat!
print(iris.shape)
iris.head()

Now try to plot lengths against widths of the petals for all flower types. The kind of plot you'll want to use is most likely a scatterplot, so see if you can google your way to the syntax that will take petal length and width as inputs for each datapoint.

In [None]:
# Type your code below:
