# Python Basics Walkthrough

👉 **Note about `print()`:**
- In Jupyter, if you just type a variable name, its value is shown as output.
- If you want to display multiple things, or add text around them, use `print()`.

# ASDS Coding Camp - Day 3

## Python basics

In [None]:
# The hast '#' sign is used to comment out text
# Text preceded by # will not be read by Python

In [None]:
#This will be read by Python, and will produce an error message

There is no equivalent in Python to R's ls() function. \
\
Calling $\texttt{dir()}$ in Python is similar to calling $\texttt{ls()}$ and $\texttt{search()}$ simultaneously in R: it provides a list of names of objects in the current *scope*, which is similar to R's concept of environment.

In [None]:
dir()

In [None]:
a = "Hello world"

In [None]:
print(a)

The core distribution of Python has very limited data science capabilities. For this reason, doing data analysis in Python requires additional libraries to be attached. 

## 7. Plotting with Matplotlib
We can create quick plots with just a few lines.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

The three most important of these libraries are $\textbf{numpy}$ (short for Numerical Python), which provides some additional data structures for arrays; $\textbf{pandas}$ (short for Panel Data), which does the same for tabular data; and $\textbf{matplotlib}$, which provides graphical capabilities. Anaconda comes all three installed, and to use them in a Python session we *import* them *as* an abbreviated version. 

In [None]:
# conda install pandas
# conda update pandas

### Help! 

Accessing help in Python is similar to R. Calling the help() function, and passing the name of a function within the brackets, will call the help file for that function. It is also generally possible to use a question mark (?) as a shortcut, though be careful as this is not a "built-in" capability. 

In [None]:
#help()

In [None]:
# help? and ?help will do the same 

Next, we call on a built-in function, len(), which works like length() in R - or does it? \
When we pass our recently created object "a" to the length() function, what value does it return?

In [None]:
len(a)

In [None]:
len?

In [None]:
help(len)

### Making data 

Mostly in Python you will be dealing with data which you have imported from external sources, or data types associated with numpy or pandas. However, it is good to know about Python's basic data structures, because everything else is build on them, and they can be less intuitive to work with than R. \
\
In R, the assignment operator is <- . In Python, we use the equals sign (=).

In [None]:
b = 5

In [None]:
print(b)

In [None]:
# Integer
b = -125

In [None]:
# Float
score = 0.452
result = 1e-5

In [None]:
# Boolean 
answer = True
test_val = False

In [None]:
# String
mytext = "this is some text"
hello_str = "hello" + "world"

In [None]:
mytext

In [None]:
hello_str

## LISTS

Python has numerous types of data structures: "a" is a string (str); "b" is an integer (int), which are both types pf *scalar*, or single values; "c" and "d" are both *lists*, which contain multiple values. Note the use of square brackets [] and comma separation of values. Note also that a list can contain both integers and strings.

In [None]:
c = [1, 2, 3, 3, 2]

In [None]:
print(c)

In [None]:
d = [1, "two", 3]

In [None]:
print(d)

In [None]:
# First value has index 0
c[0]

In [None]:
# Access in reverse via negative numbers 
d[-1]

In [None]:
# Get number of items in a list
len(c)

### TUPLES

A *tuple* is similar to a list, but its contents cannot be modified. Note that round brackets () are not needed, but an be used. 

In [None]:
e = 1,2,3

In [None]:
print(e)

In [None]:
f = (1,2,3)
print(f)

In [None]:
# Trying to modify a tuple after creation will cause an error
tup_2025 = (2, True, "adss") # Values of tuples can take multiple types 

In [None]:
tup_2025[3] = "ADSS" # error

The last two basic structures are sets and dictionaries (dicts). A set is like a list, but can only contain unique elements: here, "g" and "h" are identical. Sets are created using curly braces {}. A dict is an important data structure in Python. Similar to a set, it is created using curly braces, but it contains pairs of *keys* and *values*, separated by colons. 

### SETS 

In [None]:
g = {1,2,3,4}
print(g)

In [None]:
h = {1,1,2,2,2,3,3,4,4,4,4}
print(h)

In [None]:
# We can create sets from lists, strings or any other iterable value, using the set() function.
g = set(c)
print(g)

In [None]:
# The 'in' operator can be used to check if a value is in a set
1 in g

In [None]:
7 in g

### Dictionaries

In [None]:
i = {1: 'This', 2: 'is', 3: 'a', 4: 'dict'}

In [None]:
print(i)

In [None]:
d1 = {"Ireland": "Dublin", "Brazil":"Brasilia"}

In [None]:
print(d1)

In [None]:
# Access a value in a dictionary
d1["Ireland"]

In [None]:
# Dictionaries can be modified after creation. 
# New pairs can easily be added using the assignment operator. 
d1['Italy'] = 'Rome'

In [None]:
d1

The example below gives us an idea of how data science libraries such as pandas build on basic Python data structures. Here, we create a dictionary (dict) comprising three lists, each defined by a different key. If we then pass this dict to the DataFrame() function in pandas (remembering to use the pd. alias when calling the function), we can transform the dict into a dataframe: a much more useful data structure for data analysis.

In [None]:
data = {'year' : [1992, 1997, 2002, 2007, 2011, 2016, 2020],
        'party' : ['Fianna Fail', 'Fianna Fail', 'Fianna Fail', 'Fianna Fail', 'Fine Gael', 'Fine Gael', 'Fianna Fail'],
        'seats' : [68, 77, 81, 78, 76, 50, 38]}

In [None]:
print(data)

In [None]:
type(data)

In [None]:
data_frame = pd.DataFrame(data)
data_frame_test = pd.DataFrame(c)
print(data_frame_test)
print(data_frame)

### Functions

We are introduced to functions in the section on help. An important difference between Python and R is the concept of *methods*. 
\
In R, every function is called the same way: the name of the function is followed by round brackets, and the *arguments* of the function are placed inside the brackets. 
\
Many Python functions operate in the same way. However, *methods* operate slightly diffrently. A method is a function which belongs to a specific *class* of object, and it is called by placing a dot (.) after the name of the object, followed by the method, followed by round brackets. 

In [None]:
# Adding elements to a list
c.append(4)
print(c)

In [None]:
c.append(0)
print(c)

In [None]:
# Sort a list in place
c.sort()
print(c)

In [None]:
# Which one?
len(c)
c.len()

In [None]:
# Reverse the order of a list
c.reverse()
print(c)

In [None]:
# Remove all items from a list
c.clear()
print(c)

## 2. Variables and Types
We can store values in variables. Python automatically detects the type (integer, float, string, etc.).

In [None]:
# Functions can be used to combine and compare 2 sets
x = set([1,2,3,4])
y = set([3,4,5])

In [None]:
print(x.intersection(y))
print(x.union(y))
print(x.difference(y))

In [None]:
# Dictionaries have various associated functions to access the keys and/or values

# Get only the keys from a dictionary
d1.keys()

In [None]:
# Get only the values from a dictionary
d1.values()

In [None]:
# Get all keys and values, pairs at tuples
d1.items()

In [None]:
# Here, we make a list of numbers, 'a', which we then convert into a dataframe, an object belonging to the 
# pandas module, using the function DataFrame() and the pd alias. We then call the mean() *method*, a type 
# of function belonging to the dataframe class, on this object. 

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
a = pd.DataFrame(a)
a.mean()

In [None]:
# To call the help file on a method, we need to supply an object of the proper class. Here, for example,
# the first four calls to help on the .mean() method do not work. Calling help on methods can be extremely
# unintuitive, and is a good example of Python's steep learning curve. 

?mean
?mean()
?.mean()
help(mean)

In [None]:
?a.mean

In [None]:
# Next, we call the .mean() method on the 'seats' column of the 
# data_frame object we just created. Notice that to select the seats 
# column we can use either a dot (.) after the name of our object, 
# or square brackets [] with the column name inside, surrounded by 
# quotes ''. Note how this differs to R's $ operator. We then call 
# the mean() method by adding another dot(.), followed by round 
# brackets. 

data_frame.seats.mean()
data_frame['seats'].mean()

It is possible to *chain* methods together to perform fairly complex operations in a very concise manner. 
\
Say we wanted to calculate the average number of seats won by the largest party in the Dail following a general election, and to see whether there is any difference between the average number of seats won by Fianna Fail and Fine Gail. To achieve this, we first select our object and column of interest, data_frames['seats']. \
\
We then use the $\texttt{groupby}$ method, a functio which *groups* the 'seats' column *by* the values in the 'party' column. 
\
If we then call the mean method on this newly grouped data, Python will return the average seats for each party.


In [None]:
data_frame['seats'].groupby(data_frame['party']).mean()

In [None]:
# The same syntax works equally well with the std method, which returns the 
# standard deviation of the mean we just calculated. What does this tell us 
# about the respective parties? 

data_frame['seats'].groupby(data_frame['party']).std()

In [None]:
# Below is a simple function, which demonstrates one of the most important
# aspects of Python as a programming language: indentation. 

def a_function(x):
    if x > 1:
        print('x is greater than 1')
    else:
        print('x is less than or equal to 1')

In [None]:
a_function(5)

In [None]:
a_function(-5)

In [None]:
# Whereas in R indentation is of little importance (it is mainly used
# as a convention to help with readability of code), in Python it is
# extremely important. Indentation tells Python which sections of 
# code belong together. Here, the "if" and "else" statements are both
# at the same level of indentation, as are the two calls to "print".
# This is a bit like telling Python "if the argument I give you, x,
# is bigger than 1, print the first statement. Otherwise, print the
# second one." In R, we would use curly braces {} and commas "," to
# separate these statements.

## Plotting

We use the $\texttt{matplotlib}$ library in Python to provide graphical capabilities, aliased as "plt".  \
Below, we create a simple line plot of the number of seats won by the leading party at each GE since 1992. Note how we build up our plot in stepwise fashion: we begin with a call to the plot() function, in which we specify our x and y axes, taking advantage of *positional* arguments (i.e. the *first* argument in the function is the x axis, the *second* is the y axis). We then call the xlabel(), ylabel(), and title() functions. This approach to plotting graphics is very similar to R's base graphics, which we briefly used yesterday. 

In [None]:
plt.plot(data_frame.year, data_frame.seats)
plt.xlabel("Year")
plt.ylabel("Seats")
plt.title("No. of seat won by leading party, General Elections 1992-2020")

### Activity

By recycling the code, try to create a dataframe and a line plot for the number of seats of the second largest party over the same period. You can find the figures at Wikipedia (https://en.wikipedia.org/wiki/D%C3%A1il_election_results)

In [None]:
data = {'year': [1992, 1997, 2002, 2007, 2011, 2016, 2020],
        'party': ['Fianna Fail', 'Fianna Fail', 'Fianna Fail', 'Fianna Fail', 'Fine Gael', 'Fine Gael', 'Fianna Fail'],
        'seats': [68, 77, 81, 78, 76, 50, 38]}

data_frame = pd.DataFrame(data)
print(data_frame)

data_frame.seats.mean()
data_frame['seats'].mean()

data_frame['seats'].groupby(data_frame['party']).mean()
data_frame['seats'].groupby(data_frame['party']).std()

plt.plot(data_frame.year, data_frame.seats)
plt.xlabel("Year")
plt.ylabel("Seats")
plt.title("No. of seat won by leading party, General Elections 1992-2020")

In [None]:
################
# And finally...
################

# There is much more to Python as a language, but the learning curve
# can be quite steep, and it is easy to become overwhelmed. For this
# reason we will leave it here for today - Congratulations, you 
# survived Python basics!

In [None]:
data = [1, 2, 3, 4, 5]
squares = [n**2 for n in data]

plt.plot(data, squares, marker="o")
plt.title("Numbers and their squares")
plt.xlabel("Number")
plt.ylabel("Square")
plt.show()