# **Essential Python Coding**

This tutorial will introduce you essential Python coding including the following:

* Importing libraries
* Importing data
* Data types and functions
* Dataframes
* Dropping nulls/duplicates
* Filtering data


We will be using the following libraries: 
```
- pandas
- seaborn
```

## **Importing Libraries**
Libraries in Python are important tools to aid usability. 
The below code is importing the following packages; seaborn and pandas. The 'as' instruction in purple is giving these libraries a shorter nickname (alias) which can save time if you need to refer to them in your code.

In [None]:
import seaborn as sns
import pandas as pd
from seaborn import get_dataset_names

## **Data Types and Functions**

Values can be assigned using to letters using the = symbol. The below code provides an example.  

In [None]:
a = 2 
b = 3
c = '4'

In [7]:
a = 2
print(a)

2


The **print** function will return the value assigned to that stated within the brackets. 

In [None]:
print(a,b,c)

2 3 4


In [None]:
print(a+b)

5


However, you will have an error message if the value assigned does not make logical sense. 

It is important to consider the type of data you are using. The below code shows the type of data assigned to 'a' is integer or 'int'. This suggests that this data is whole number. 

The data that was read in to value c however is a string or 'str'. 

In [None]:
type(a)

int

In [None]:
type(c)

str

The below code does not work because it does not make mathematical sense to add integers with a string value. 

In [None]:
print(a+b+c)

TypeError: ignored

The string depicts a sequence and can be applied to the data. 

In [None]:
print((a+b)*c)

44444


**However**, the str data can be converted to an integer using the int() function as seen below. 

In [None]:
c=int(c)

In [None]:
print(c)

4


The print function will output the value entered within the brackets. 

In [None]:
print(a+b+c)

9


Below, the word **hello** has been assigned to d, and **world** to the letter e. The print() function shows two ways of outputing the phrase 'hello world'. 

In [None]:
d = 'hello' 
e = 'world'

In [None]:
print(d, e)

hello world


In [None]:
print(d+' '+e)

Both print(d, e) and print(d+' '+e) include a space and the output therefore also includes a space between hello and world. The below code shows the output without a space. 

In [None]:
print(d+e)

helloworld


In [None]:
pow(5, 2)

25

In [None]:
5**2

25

In [None]:
type(a) #whole numbers

int

In [None]:
type(d) #str = string of data - any characters. Allow text and words.  

str

In [None]:
r=5.9

In [None]:
type(r) #floating point value, binary representation, the decimal is floating and moves with the size of the value. When made int - the data is floored and decimal dropped whilst not rounding. 

float

In [None]:
int(r)

5

In [None]:
d.islower() # Is lower case

True

### **Exercise 1**
> Assign your name as a string

> Assign your age as an integer

> Print your name and your age

In [None]:
# Write code for assigning your name



In [None]:
# Write code for assigning your age



In [None]:
# Return data types for name and age



In [None]:
# Print your name and age in a sentence within the same print statement



In [6]:
#@markdown Click here to reveal the answer
name = 'Jisc'
age = 29

print(type(name))
print(type(age))

print(name, 'is', age, 'years old.')

print(name + ' is ' + str(age) + ' years old.')



<class 'str'>
<class 'int'>
Jisc is 29 years old.
Jisc is 29 years old.


## **Importing Data**

Firstly, upload your data into google colab as a .csv file. 

Use the code below to import the data into colab. 

Download the following CSV file from the HESA website.

Click [here](https://www.hesa.ac.uk/data-and-analysis/finances/chart-1), download the CSV file under the chart and save locally.

The following cell is required in Google Colab to import a file into Google Colab's temporary memory before then using the panda's read_csv function to import the data.

Run the below cell and click 'Choose Files' to select the file you require.

In [8]:
from google.colab import files
  
uploaded = files.upload()

Saving oc031-chart-1.csv to oc031-chart-1.csv


Import the pandas library if not already imported, this packages turns the data into a dataframe which improves the usability of the data. 

Give your data frame a name, in this case it is called **df**. The pd.read_csv function will read the .csv file into collab. 

The **skiprows** element of the code states how many of the rows should be missed in order to only read in the data you need, and avoid any notation that is typically in the top rows of the .csv file. In this case, the number of rows skipped is equal to 10. 

The **print** function of the df displays the first 6 rows. 

In [9]:
import pandas as pd #turns into dataframe
 
df = pd.read_csv('oc031-chart-1.csv', skiprows=10)
print(df)

      Year Tuition fees and education contracts Funding body grants  \
0  2014/15                               15,541               5,345   
1  2015/16                               16,811               5,167   
2  2016/17                               17,757               5,105   
3  2017/18                               19,018               5,124   
4  2018/19                               20,300               5,326   
5  2019/20                               21,546               5,499   

  Research grants and contracts Other income  Investment income  \
0                         5,968        5,902                230   
1                         5,886        6,045                261   
2                         5,916        6,165                254   
3                         6,224        7,363                256   
4                         6,577        7,964                401   
5                         6,293        7,285                373   

   Donations and endowments  
0 

Examine the data types

In [None]:
df.dtypes

Check for any missing values in the 'Other income' variable

In [11]:
pd.isna(df['Other income'])

0    False
1    False
2    False
3    False
4    False
5    False
Name: Other income, dtype: bool

Drop the missing data values across the dataframe

In [10]:
dataFrame = df.dropna()
dataFrame

Unnamed: 0,Year,Tuition fees and education contracts,Funding body grants,Research grants and contracts,Other income,Investment income,Donations and endowments
0,2014/15,15541,5345,5968,5902,230,532
1,2015/16,16811,5167,5886,6045,261,578
2,2016/17,17757,5105,5916,6165,254,585
3,2017/18,19018,5124,6224,7363,256,737
4,2018/19,20300,5326,6577,7964,401,927
5,2019/20,21546,5499,6293,7285,373,931


## **Filtering**

You can filter (or subset) the dataframe by different attributes. For example you can select a certain year of the HESA data to be shown.

In [13]:
df_2017 = dataFrame[dataFrame['Year']=='2017/18']

In [16]:
df_2017.head()

Unnamed: 0,Year,Tuition fees and education contracts,Funding body grants,Research grants and contracts,Other income,Investment income,Donations and endowments
3,2017/18,19018,5124,6224,7363,256,737


Use commands to explore the data.



In [None]:
df.info()
df.describe()
df.shape()

Seaborn has several built in datasets which we will be using to demonstrate the essentials of Python coding.

### Exercise 2

In [None]:
df = sns.load_dataset("penguins")
df.head(100)

> Convert the flipper_length_mm to an integer

> Filter the data to penguins with flipper lengths 200mm and over

In [None]:
# Write code here



In [37]:
#@markdown Show answer
df['flipper_length_mm'] = pd.to_numeric(df['flipper_length_mm'])

big_flippers = df[df['flipper_length_mm']>=200]

big_flippers.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
53,Adelie,Biscoe,42.0,19.5,200.0,4050.0,Male
90,Adelie,Dream,35.7,18.0,202.0,3550.0,Female
91,Adelie,Dream,41.1,18.1,205.0,4300.0,Male
95,Adelie,Dream,40.8,18.9,208.0,4300.0,Male
101,Adelie,Biscoe,41.0,20.0,203.0,4725.0,Male
