# Introduction to Python

This is a Jupyter notebook. This is a smart and easy way to run Python-code interactively in the browser. The term *interactively* means that we at any time can stop the process and see the individual outputs, change variable expressions or plot the intermediate results. This notebook can also be run on ERDA and Google Drive, which means that you do not have to have Python installed on your own computer to run the files, although I recommend having a local installation up and running as your primary place of coding. Python and Jupyter notebooks are available for all platforms (Windows, Mac, Linux), for free. I recommend installing the [Anaconda distribution](https://www.anaconda.com/products/individual) of Python, which comes with a lot of useful packages pre-installed, as well as text editors especially designed for Python (Spider and JupyterLab). 

This notebook is meant as a very basic introduction to Python, and specifically how load and save data from files. It is not meant as a comprehensive introduction to Python, but rather as a quick guide to get you started.
***


Valentina Espinoza F. (University of Copenhagen)  
10th January 2023 (latest update)

## Write data

We start with writing a file randomnumbers.dat containing some random numbers. First we create and open `randomnumbers.dat` in write mode. Note that the file is only open in the block opened by the `with ... as file` statement, i.e. the file is closed automatically when the block is exited. While its open, we will run a for loop to generate 10 rows of random numbers according to different probability density functions (uniform, gaussian, etc). 

We use the method `file.write()` to add lines to our file. The method `file.write()` takes a string as argument, so we have to convert the numbers to strings before we can write them to the file.
To do so, we use the `format()` function (shortened here as `f""`), a very powerful function for formatting strings, and is used extensively in Python. The `format()` function is called on a string, and the string contains placeholders for the variables that we want to insert into the string. The placeholders are marked with curly brackets `{}`. The `format()` function can also be used to format numbers, e.g. to control the number of decimals. In the example below, we use the placeholder `{:.4f}` to indicate that we want to insert a *floating* point number with 4 decimals. You can try replacing the `f` with an `e` to see what happens.

In [54]:
# Get the r object from the numpy random module
import numpy as np
r = np.random

with open('randomnumbers.dat', 'w' ) as file: 
    
    header_line = f"Uniform\tGaussian\tPoisson\tExponential\tPower\n"    
    print(header_line[:-2]) # do not print the '\n'
    file.write(header_line)
    
    for i in range(10): 

        # Numbers distributed according to the following density functions: uniform, gaussian, poissonian, exponential, power
        new_line = f"{r.uniform():.4f}\t{r.normal():.4f}\t{r.poisson(10.0):.0f}\t{r.exponential():.4f}\t{r.power(1):.4f}\n"
        print(new_line[:-2]) # do not print the '\n'
        file.write(new_line)

Uniform	Gaussian	Poisson	Exponential	Powe
0.0452	-0.0736	6	0.4795	0.969
0.8421	-0.2822	14	0.5358	0.273
0.0564	0.2594	10	2.8962	0.849
0.2473	0.3023	9	3.0802	0.606
0.2286	1.0604	9	1.1134	0.520
0.7723	1.5414	10	0.8028	0.560
0.8767	-1.0204	11	0.2395	0.136
0.0145	-0.2691	8	0.4980	0.437
0.9042	0.2001	13	0.9731	0.862
0.9495	-2.1709	10	0.6146	0.980


If we have an array of data, we can use Numpy's `savetxt()` function to write the data to a file. The `savetxt()` function takes two arguments: the name of the file to write to, and the array of data. The `savetxt()` function can also take a third argument, which is the format string, e.g. `fmt="%.4f"` to write floating point numbers with 4 decimals. 

In [55]:
num_rows = 10
my_random_array = np.empty((num_rows, 5))
header_line = "Uniform\tGaussian\tPoisson\tExponential\tPower"

# Generate random numbers for each column and row
my_random_array[:, 0] = r.uniform(size=num_rows)
my_random_array[:, 1] = r.normal(size=num_rows)
my_random_array[:, 2] = r.poisson(lam=10, size=num_rows)
my_random_array[:, 3] = r.exponential(size=num_rows)
my_random_array[:, 4] = r.power(a=1, size=num_rows)

np.savetxt('randomnumbers.dat', my_random_array, fmt='%.4f', delimiter='\t', header=header_line, comments="")  # values will no be the same as above, because randomness 

Similarly, if you have a pandas DataFrame, you can use the `to_csv()` method to write the data to a file.

In [65]:
# If you were to have the data as a pandas DataFrame
import pandas as pd

my_random_df = pd.DataFrame(my_random_array, columns=['Uniform', 'Gaussian', 'Poisson', 'Exponential', 'Power'])

with open('randomnumbers.dat', 'w' ) as file: 
    my_random_df.to_csv(file, sep='\t', index=False, lineterminator='\n', float_format='%.4f')

All of the methods above will only work for regularly-structured data, i.e. data that can be represented as a table with rows and columns. If you have more complex data, e.g. a DataFrame of DataFrames, you can save you data as a pickle file using the `pickle.dump()` function. Pickle files are binary files, and can only be read by Python.

In [70]:
import pickle

student1_dict = {'name': 'John', 'age': 23, 'city': 'Lake City'}
student2_dict = {'name': 'Anna', 'age': 21, 'city': 'Ontario'}
students = [student1_dict, student2_dict]

# Save the dictionary using pickle
with open('students_2024.obj', 'wb') as file:
    pickle.dump(students, file)

## Read data

Often you will have to read and write text files (.csv, .txt, etc). To provide and example we will read the same file we created above, `randomnumbers.dat`.

In [56]:
# Declaring lists that will contain the columns of random numbers
uni = []
gauss = []
pois = []
exp = []
power = []

# Read the file
with open('randomnumbers.dat', 'r' ) as file: 
    
    # Skip the first line (header) with the `next()` command
    next(file)  
    
    # Loop through each line in the file
    for line in file: 

        # Strip it of '\n and \t' (i.e. newlines and tabulations) using `strip()`
        strip_line = line.strip()
        
        # Split the line into a list using `split()`
        format_line = strip_line.split()

        # Append the elements to the corresponding list
        uni.append( float(format_line[0]) )        # "uni" gets the first element in the list
        gauss.append( float(format_line[1]) )      # "gauss" gets the second element ...
        pois.append( float(format_line[2]) )
        exp.append( float(format_line[3]) )
        power.append( float(format_line[4]) )
                
        
        
print("Uniformly distributed random numbers: ", uni)

Uniformly distributed random numbers:  [0.4926, 0.3288, 0.6334, 0.2401, 0.0759, 0.1289, 0.128, 0.1519, 0.1388, 0.6409]


All of the above can be more easily done using Numpy, and its function `np.loadtxt()`:

In [57]:
uni_np, gauss_np, pois_np, exp_np, power_np = np.loadtxt('randomnumbers.dat', unpack=True, skiprows=1)
print("Uniformly distributed random numbers: ", uni_np)

Uniformly distributed random numbers:  [0.4926 0.3288 0.6334 0.2401 0.0759 0.1289 0.128  0.1519 0.1388 0.6409]


We could also use pandas `pd.read_csv()` function to read the file. This function is very powerful, and can preserve the column names to create a `pd.DataFrame` object:

In [66]:
import pandas as pd

rand_nums_df = pd.read_csv('randomnumbers.dat', sep='\t')
rand_nums_df

Unnamed: 0,Uniform,Gaussian,Poisson,Exponential,Power
0,0.4926,-0.5139,7.0,0.7606,0.5936
1,0.3288,-1.0592,10.0,0.2777,0.6791
2,0.6334,-0.0627,11.0,0.3137,0.7892
3,0.2401,0.9551,4.0,0.4737,0.4984
4,0.0759,-0.9857,7.0,0.0203,0.0869
5,0.1289,0.504,9.0,0.3887,0.5371
6,0.128,-0.5303,11.0,0.2376,0.5868
7,0.1519,-0.7929,10.0,0.3967,0.7454
8,0.1388,-0.107,10.0,0.1276,0.4317
9,0.6409,-1.0352,5.0,2.2121,0.1276


In [59]:
uni_pd = rand_nums_df['Uniform']
print("Uniformly distributed random numbers: ", uni_pd.values)

Uniformly distributed random numbers:  [0.4926 0.3288 0.6334 0.2401 0.0759 0.1289 0.128  0.1519 0.1388 0.6409]


For the case of reading back `pickle` files, we can use the `pickle.load()` function.

In [71]:
with open('students_2024.obj', 'rb') as datafile:
    pickle_data = pickle.load(datafile)
    
print(type(pickle_data))
pickle_data

<class 'list'>


[{'name': 'John', 'age': 23, 'city': 'Lake City'},
 {'name': 'Anna', 'age': 21, 'city': 'Ontario'}]