We always start Python by importing the 'packages' that we want to work with. numpy and pandas are the two basic packages for working with data.

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Some basic Python instructions

In [None]:
print(1+2)

In [None]:
x = 6
y = 3

In [None]:
print(x+y)

In [None]:
print(x*y)

In [None]:
print(x/y)

The first line imports NumPy, a favorite Python package for tasks like

1. working with arrays (vectors and matrices)

2. common mathematical functions like cos and sqrt

3. generating random numbers

4. linear algebra, etc.

After import numpy as np we have access to these attributes via the syntax np.attribute.

Here’s two more examples



What if we had two rows / columns of data rather than just two numbers? Ignore the 'np.array' code written here for now.

In [None]:
np.sqrt(4)

In [None]:
np.log(4)

As stated above, NumPy is a Python package.

Packages are used by developers to organize code they wish to share.

In fact, a package is just a directory containing

files with Python code — called modules in Python speak

possibly some compiled code that can be accessed by Python (e.g., functions compiled from C or FORTRAN code)

a file called **`__init__`**.py that specifies what will be executed when we type import package_name

You can check the location of your **`__init__`**.py for NumPy in python by running the code:

In [3]:
import numpy as np

print(np.__file__)

C:\Users\ozcan\PycharmProjects\UCLouvain-LSM-Python-Finance\.venv\Lib\site-packages\numpy\__init__.py


In [None]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

In [None]:
print(x*y)

And what if we added some text instead of numbers?

In [None]:
x = "pink"
y = "pineapple"

In [None]:
print(x+y)

But not all text calculations work...

In [None]:
print(x*y)

Items in lists are ordered, and duplicates are allowed in lists.



In [None]:
x = [10, 'foo', False]
type(x)

The first element of x is an integer, the next is a string, and the third is a Boolean value.

When adding a value to a list, we can use the syntax list_name.append(some_value)



In [None]:
x

In [None]:
x.append(2.5)
x

Here append() is what’s called a method, which is a function “attached to” an object—in this case, the list x.

We’ll learn all about methods later on, but just to give you some idea,

Python objects such as lists, strings, etc. all have methods that are used to manipulate data contained in the object.

String objects have string methods, list objects have list methods, etc.

Lists in Python are zero-based (as in C, Java or Go), so the first element is referenced by x[0]

In [None]:
x[0]   # first element of x

In [None]:
x[1]   # second element of x

The For Loop

Unlike most other languages, Python knows the extent of the code block only from indentation.

Let’s look at another example of a for loop

In [None]:
animals = ['dog', 'cat', 'bird']
for animal in animals:
    print("The plural of " + animal + " is " + animal + "s")


The Python interpreter performs the following:

For each element of the sequence, it “binds” the name variable_name to that element and then executes the code block.

Let’s do one more application before we turn to exercises.

In this application, we plot the balance of a bank account over time.



In [None]:


r = 0.025         # interest rate
T = 50            # end date
b = np.empty(T+1) # an empty NumPy array, to store all b_t
b[0] = 10         # initial balance

for t in range(T):
    b[t+1] = (1 + r) * b[t]

plt.plot(b, label='bank balance')
plt.legend()
plt.show()

Storing data in a dictionary

In [None]:
dict_grade = {"Ann" : 75, "Paul" : 74, "Mary" : 80, "Bob" :   "I'm not a grade"}

In [None]:
dict_grade.get("Ann")

In [None]:
dict_grade.get("Bob")

What can I do to a given piece of data?

In [None]:
x = np.array([1, 2, 3, 4, 5])

dir(x)

In [None]:
x.mean()

In [None]:
x.max()

In [None]:
x.shape

In [None]:
import numpy as np                     # Load the library

a = np.linspace(-np.pi, np.pi, 100)    # Create even grid from -π to π
a

In [None]:
b = np.cos(a)                          # Apply cosine to each element of a
c = np.sin(a)                          # Apply sin to each element of a

Defining your own functions

In [None]:
def greetings(name):
  '''write something here'''
  print('Hello, ' + name + '. Good afternoon!' )


In [None]:
greetings('Suwan')

In [None]:
def celsius(fahrenheit):

    '''A function to convert a temperature in fahrenheit into celsius'''

    f = float(fahrenheit)
    c = (f - 32) * 5 / 9
    print('%s Fahrenheit, converted to Celsius, is: %s Celsius.' % (str(round(f,2)), str(round(c,2))))

In [None]:
celsius(92)

In [None]:
## Installing a package

!pip install pandas

Now lets start to really use Python for data analysis. Here's some finance data I got from https://finance.yahoo.com/ for Tesla (TSLA) and Ford (F). Big thing to remember - data needs to be in the same folder as the code. In Google Colab, upload the data files to the folder. If you open the files beforehand make sure not to save as it might change file format if you are using a non-English Excel.

In [None]:

from google.colab import files
uploaded = files.upload()


In [None]:
tesla = pd.read_excel('TSLA.xlsx') # by convention we tend to use lowercase letters when naming datasets
tesla.head(3)

In [None]:
ford = pd.read_excel('Ford.xlsx')
ford.head()

Let's explore these datasets

In [None]:
ford.info()

In [None]:
ford.describe()

Lets look at how to edit this dataframe
1. Add a new column
2. Delete a column
3. Create a sub-dataframe

In [None]:
ford['new_close'] = ford['Close'] + 1

In [None]:
ford.head()

In [None]:
# add a new column

ford["double_close"] = ford["Close"] * 2

ford.head()

In [None]:
# delete a column

ford = ford.drop(["new_close"], axis=1)

ford.head()

In [None]:
# create a sub-dataframe

mini_ford = ford[["Exchange Date", "Open", "Close"]].copy()

mini_ford.head()

Now let's tidy up the data to suit our needs

1. Fix dates
2. Drop unneeded columns
3. Change names
4. Add prefixes


1. Fix dates

In [None]:
# dates are messy. Use datetime to set date.

tesla['date'] = pd.to_datetime(tesla['Exchange Date'], infer_datetime_format=True)
ford['date'] = pd.to_datetime(ford['Exchange Date'], infer_datetime_format=True)
tesla['date'].head()

In [None]:
# now that we have dates correct, lets set date as index and see what that means

tesla.set_index('date', inplace=True)
ford.set_index('date', inplace=True)

tesla.head()

Now drop unneeded columns

Rename the columns in each dataset before merging the two datasets

In [None]:
# add a prefix (this isn't particularly important, I just want to show you how flexible Python is for working with data)

ford = ford.add_prefix('ford_')
tesla = tesla.add_prefix('tesla_')

tesla.head()

Those column names don't look too good, so lets rename them

In [None]:
tesla.rename(columns = {'tesla_Adj Close' : 'tesla_Close', 'tesla_Volume' : 'tesla_Volume'}, inplace = True)
ford.rename(columns = {'ford_Adj Close' : 'ford_Close', 'ford_Volume' : 'ford_Volume'}, inplace = True)

tesla.head()

In [None]:
# merge the two datasets together

cars = pd.merge(tesla, ford, left_index=True, right_index=True)

cars.head()

Let's save our new dataset in case anything goes wrong

In [None]:
cars.to_csv('Cars.csv')

In [None]:
# we can now reopen it from our files if we wish (no need to though, its already open):
# read_csv options: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

cars = pd.read_csv('Cars.csv', index_col='date')
cars.head()

In [None]:
cars.info()

We'll create some new columns that will be useful in our analysis

Generate monthly returns
Generate static standard deviation measure

In [None]:
# monthly returns

cars['tesla_returns'] = cars['tesla_Close'] / cars['tesla_Close'].shift(1) - 1
cars['ford_returns'] = cars['ford_Close'] / cars['ford_Close'].shift(1) - 1

# or we can

# cars['tesla_returns'] = cars['tesla_Close'].pct_change()
# cars['ford_returns'] = cars['ford_Close'].pct_change()

cars.head()

In [None]:
# static standard deviation

cars['tesla_std'] = cars['tesla_returns'].std()
cars['ford_std'] = cars['ford_returns'].std()

cars.head()

Let's now do some initial charts

In [None]:
# import packages

import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
cars['tesla_returns'].plot();

In [None]:
cars['tesla_Close'].plot();

In [None]:
# improved charting

rolling = cars['tesla_returns'].rolling(12, center=True)

data = pd.DataFrame({'monthly returns': cars['tesla_returns'],
                     '3m rolling_mean': rolling.mean(),
                     '3m rolling_std': rolling.std()})
ax = data.plot(style=['-', '--', ':'])
ax.lines[0].set_alpha(0.3)
ax.tick_params(axis='x', rotation=45)

And even better charts

In [None]:
# install the necessary packages

import sys
!{sys.executable} -m pip install "notebook>=5.3" "ipywidgets>=7.5"
!{sys.executable} -m pip install plotly==4.14.3

In [None]:
# import the relevant package

import plotly.express as px

# create the charts

fig = px.line(cars, x=cars.index, y="tesla_Close", title='Tesla stock price')
fig.show()

In [None]:
fig = px.bar(cars, x=cars.index, y="tesla_returns", title='Tesla stock price')
fig.show()