***
# 2. Introduction to Python in Jupyter - The basics - Part I
>-basic coding (accessing directories, commenting, help,...) <br>
-relative file paths <br>
-indentation
***

## 2.1 Basic coding tips

In [1]:
# Use the number symbol to comment your code
# If you want to comment multiple lines, you can select all the lines and do: CRTL and /
# Undo commenting by selecting the lines and do CTRL and / again

# Use the back slash to split a long code in parts
# For example

a1 = '1' + '2' + '3' + \
    '4' + '5'

# is the same as 


a2 = ('1' + '2' + '3' +
    '4' + '5')

In [2]:
print (a1)
print (a2)

12345
12345


In [3]:
# But this will give an error
a3 = '1' + '2' + '3' + 
    '4' + '5'

print(a3)

SyntaxError: invalid syntax (3041281332.py, line 2)

In [None]:
# !pwd shows you the location of your current file
# You use ! to run terminal commands (from the underlying system)

!pwd

In [None]:
# Some other terminal commands 
# Both comments show a list of files in the directory
!ls
!dir

In [None]:
# You could also use the a Python function to see your working directory
# You would need to import a specific module for this

import os
os.getcwd()

In [4]:
# This is how you ask for help
?


IPython -- An enhanced Interactive Python

IPython offers a fully compatible replacement for the standard Python
interpreter, with convenient shell features, special commands, command
history mechanism and output results caching.

At your system command line, type 'ipython -h' to see the command line
options available. This document only describes interactive features.

GETTING HELP
------------

Within IPython you have various way to access help:

  ?         -> Introduction and overview of IPython's features (this screen).
  object?   -> Details about 'object'.
  object??  -> More detailed, verbose information about 'object'.
  %quickref -> Quick reference of all IPython specific syntax and magics.
  help      -> Access Python's own help system.

If you are in terminal IPython you can quit this screen by pressing `q`.


MAIN FEATURES
-------------

* Access to the standard Python help with object docstrings and the Python
  manuals. Simply type 'help' (no quotes) to invoke it.

* Ma

In [5]:
# This is how to ask for help about a module (same for functions, packages, etc)
os?

Object `os` not found.


***
## 2.2 Relative file paths

### **Always** store your data and resources in the same folder (or subfolder) as your .ipynb file (your jupyter notebook)! 
### And always use relative paths 
If you have a Jupyter notebook that uses a specific dataset, make sure that you save the data in the same folder as your notebook.<br>
If you work from a local drive, in Windows (or any other OS), you would usually copy the file path directly from your `File explorer.` <br>
In Windows, it would like this: `C:\Training\data\myData.csv` <br>

Windows uses a backslash character between folder names (most other operating systems use a forward slash). <br>
If you define a file path with backslashes in python, you will get an error, python will not find your file. <br>
Why? Well, Python has special uses for the backslash in combination with certain characters (for example /n means next line, and their are other uses, but who would memorize those right!?!) <br>

#### **So how do we solve this?**

**There are 3 "manual" ways:**
1. double all your back slashes, so `C:\Training\data\myData.csv` would become `C:\\Training\\data\\myData.csv`
2. change you backslashes into forward slashes, so `C:\Training\data\myData.csv` would become `C:/Training/data/myData.csv`
3. Ugh...Imagine you have 100 filepaths with 20 backslashes to convert. Ok, maybe you could do a find and subsitute. OR, you can use the r! <br>
    -Would look like this file = `r'C:\Training\data\myData.csv'` or  `r"C:\Training\data\myData.csv"`
    
*Let's look at these manual solutions!*

In [6]:
# First, we need to import a few libraries to be able to read the data
import numpy as np # numpy is a python library for working with arrays, if you want more info about it write np? in a code cell.
import pandas as pd # for data analysis, for more info write pd? in a code cell.

In [7]:
pd?

[0;31mType:[0m        module
[0;31mString form:[0m <module 'pandas' from '/opt/conda/lib/python3.9/site-packages/pandas/__init__.py'>
[0;31mFile:[0m        /opt/conda/lib/python3.9/site-packages/pandas/__init__.py
[0;31mDocstring:[0m  
pandas - a powerful data analysis and manipulation library for Python

**pandas** is a Python package providing fast, flexible, and expressive data
structures designed to make working with "relational" or "labeled" data both
easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, **real world** data analysis in Python. Additionally, it has
the broader goal of becoming **the most powerful and flexible open source data
analysis / manipulation tool available in any language**. It is already well on
its way toward this goal.

Main Features
-------------
Here are just a few of the things that pandas does well:

  - Easy handling of missing data in floating point as well as non-floating
    point data.
  - Size 

In [8]:
# We will read the file called climate.csv in the data folder
# First, we will assign the full filename to a variable, file
# Note that I have changed the slashes to forward slashes
file = "./data/climate.csv"

In [9]:
# Now we read this variable using the panda read csv function
# The data will be saved within the variable called inClimate
inClimate = pd.read_csv(
    file, skiprows=np.arange(0, 0), na_values="-99.99"
)

In [10]:
# Just type the variable name to display the data
# One year of weather data for a randomly choosed weather station in Quebec
inClimate

Unnamed: 0,Longitude (x),Latitude (y),Station Name,Climate ID,Date/Time,Year,Month,Day,Data Quality,Max Temp (°C),...,Total Snow (cm),Total Snow Flag,Total Precip (mm),Total Precip Flag,Snow on Grnd (cm),Snow on Grnd Flag,Dir of Max Gust (10s deg),Dir of Max Gust Flag,Spd of Max Gust (km/h),Spd of Max Gust Flag
0,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-01,2010,1,1,,-3.0,...,4.0,,4.0,,26.0,,,,,
1,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-02,2010,1,2,,-8.0,...,10.0,,10.0,,27.0,,,,,
2,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-03,2010,1,3,,-5.0,...,5.0,,5.0,,35.0,,,,,
3,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-04,2010,1,4,,-7.0,...,0.0,T,0.0,T,,M,,,,
4,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-05,2010,1,5,,-5.5,...,0.0,,0.0,,35.0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
360,-74.17,45.32,COTEAU DU LAC,7011947,2010-12-27,2010,12,27,,-10.5,...,0.0,,0.0,,25.0,,,,,
361,-74.17,45.32,COTEAU DU LAC,7011947,2010-12-28,2010,12,28,,-4.0,...,0.0,,0.0,,25.0,,,,,
362,-74.17,45.32,COTEAU DU LAC,7011947,2010-12-29,2010,12,29,,0.0,...,0.0,,0.0,,25.0,,,,,
363,-74.17,45.32,COTEAU DU LAC,7011947,2010-12-30,2010,12,30,,1.5,...,0.0,,0.0,,25.0,,,,,


**If we would would use the back slashes for the filepath, you would get an error. <br>If you use the forward slash, double back slash or r '',
the code would run ok**

**There are at least two solutions that use specific libraries that will work on both Windows and Mac machines:**
1. os.path.join() <br>
import os.path <br>
data_folder = os.path.join("source_data", "text_files") <br>
file_to_open = os.path.join(data_folder, "raw_data.txt") <br>
f = open(file_to_open)

2. A shorter syntax would be by using python's pathlib library
from pathlib import Path <br>
data_folder = Path("source_data/text_files/") <br>
file_to_open = data_folder / "raw_data.txt" <br>
f = open(file_to_open)

In [11]:
# We will try the first method
# Start by importing the relevant library
import os.path

In [12]:
#Use the function to create the filepath, display the data
data_folder = os.path.join("data")
file_to_open = os.path.join(data_folder, "climate.csv")
f1 = open(file_to_open)
inClimate1 = pd.read_csv(
    f1, skiprows=np.arange(0, 0), na_values="-99.99"
)
inClimate1

Unnamed: 0,Longitude (x),Latitude (y),Station Name,Climate ID,Date/Time,Year,Month,Day,Data Quality,Max Temp (°C),...,Total Snow (cm),Total Snow Flag,Total Precip (mm),Total Precip Flag,Snow on Grnd (cm),Snow on Grnd Flag,Dir of Max Gust (10s deg),Dir of Max Gust Flag,Spd of Max Gust (km/h),Spd of Max Gust Flag
0,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-01,2010,1,1,,-3.0,...,4.0,,4.0,,26.0,,,,,
1,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-02,2010,1,2,,-8.0,...,10.0,,10.0,,27.0,,,,,
2,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-03,2010,1,3,,-5.0,...,5.0,,5.0,,35.0,,,,,
3,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-04,2010,1,4,,-7.0,...,0.0,T,0.0,T,,M,,,,
4,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-05,2010,1,5,,-5.5,...,0.0,,0.0,,35.0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
360,-74.17,45.32,COTEAU DU LAC,7011947,2010-12-27,2010,12,27,,-10.5,...,0.0,,0.0,,25.0,,,,,
361,-74.17,45.32,COTEAU DU LAC,7011947,2010-12-28,2010,12,28,,-4.0,...,0.0,,0.0,,25.0,,,,,
362,-74.17,45.32,COTEAU DU LAC,7011947,2010-12-29,2010,12,29,,0.0,...,0.0,,0.0,,25.0,,,,,
363,-74.17,45.32,COTEAU DU LAC,7011947,2010-12-30,2010,12,30,,1.5,...,0.0,,0.0,,25.0,,,,,


In [13]:
# And the second method
# Start by importing the relevant library
from pathlib import Path

In [14]:
#Use the function to create the filepath, display the data
data_folder = Path("data")
file_to_open = data_folder / "climate.csv"
f2 = open(file_to_open)
inClimate2 = pd.read_csv(
    f2, skiprows=np.arange(0, 0), na_values="-99.99"
)
inClimate2

Unnamed: 0,Longitude (x),Latitude (y),Station Name,Climate ID,Date/Time,Year,Month,Day,Data Quality,Max Temp (°C),...,Total Snow (cm),Total Snow Flag,Total Precip (mm),Total Precip Flag,Snow on Grnd (cm),Snow on Grnd Flag,Dir of Max Gust (10s deg),Dir of Max Gust Flag,Spd of Max Gust (km/h),Spd of Max Gust Flag
0,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-01,2010,1,1,,-3.0,...,4.0,,4.0,,26.0,,,,,
1,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-02,2010,1,2,,-8.0,...,10.0,,10.0,,27.0,,,,,
2,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-03,2010,1,3,,-5.0,...,5.0,,5.0,,35.0,,,,,
3,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-04,2010,1,4,,-7.0,...,0.0,T,0.0,T,,M,,,,
4,-74.17,45.32,COTEAU DU LAC,7011947,2010-01-05,2010,1,5,,-5.5,...,0.0,,0.0,,35.0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
360,-74.17,45.32,COTEAU DU LAC,7011947,2010-12-27,2010,12,27,,-10.5,...,0.0,,0.0,,25.0,,,,,
361,-74.17,45.32,COTEAU DU LAC,7011947,2010-12-28,2010,12,28,,-4.0,...,0.0,,0.0,,25.0,,,,,
362,-74.17,45.32,COTEAU DU LAC,7011947,2010-12-29,2010,12,29,,0.0,...,0.0,,0.0,,25.0,,,,,
363,-74.17,45.32,COTEAU DU LAC,7011947,2010-12-30,2010,12,30,,1.5,...,0.0,,0.0,,25.0,,,,,


***
## 2.3 Indentation
Python uses `indentation` to highlight blocks of code. Whitespace is used for indentation in Python (you determine the number of white spaces your self). <br>
All statements with the same distance to the right belong to the same block of code (compared to `curly braces` in r for example).

Python will notify you (error) in the case of obvious mismatching indentation. But you won't get a warning when there is no structural mismatch. <br>
Which can be become very problematic.

**Let's look at a few examples to illustrate this!**

***
*First example below shows the right code!
You want to calculate the factors of a specific number.*

In [18]:
# Define a function to calculate the factors of a number
# The name of the function is get_factors

def get_factors1(num):
    df=open('factorsCorrect.txt','w')
    for i in range(1, num + 1):
        if num % i == 0:
            print("{} is a factor of {}.".format(i, num))
            df.write(str(i))
            df.write('\n') # write the results of the function to a file on disk
    df.close                         

In [19]:
get_factors1(10) # Run the function to find out what the factors are of 10

1 is a factor of 10.
2 is a factor of 10.
5 is a factor of 10.
10 is a factor of 10.


***
*Second example below shows the code with an obvious mismatch!<br>
`You will get an error message`.*

In [20]:
# Define a function to calculate the factors of a number
# The name of the function is get_factors
# The code has an obvious indentation mismatch. After the <for loop> starts, a new block is expected!

def get_factors2(num):
    df=open('factorsError1.txt','w')
    for i in range(1, num + 1):
        if num % i == 0:
        print("{} is a factor of {}.".format(i, num)) #this is an obvious indentation error, because an indented block is expected after the FOR loop
            df.write(str(i))
            df.write('\n')  # write the results of the function to a file on disk
    df.close 

IndentationError: expected an indented block (1565662213.py, line 9)

In [21]:
get_factors2(10)

NameError: name 'get_factors2' is not defined

***
*Third example below shows the code with an indentation error that will not be captured by python!<br>
You will NOT get an error message, yet the result will be wrong.*

In [22]:
def get_factors3(num):
    df=open('factorsError2.txt','w')
    for i in range(1, num + 1):
        if num % i == 0:
            print("{} is a factor of {}.".format(i, num))
        df.write(str(i))  # This line (and the next) is part of the IF block, but was accidently moved to the FOR block. Answers will be wrong, but nothing wrong syntax wise!!
        df.write('\n')  # write the results of the function to a file on disk
    df.close 

In [23]:
get_factors3(10)
# The print results look okay. But check your factorsError2.txt file!!!

1 is a factor of 10.
2 is a factor of 10.
5 is a factor of 10.
10 is a factor of 10.


***
*Fourth example below shows the code with another indentation error that will not be captured by python!<br>
You will NOT get an error message, yet the result will be wrong.*

In [24]:
def get_factors4(num):
    df=open('factorsError3.txt','w')
    for i in range(1, num + 1):
        if num % i == 0:
            print("{} is a factor of {}.".format(i, num)) # This line is part of the IF block, but was accidently moved to the Function block. Answers will be wrong, but nothing wrong syntax wise!!
    df.write(str(i))
    df.write('\n')  # write the results of the function to a file on disk
    df.close    

In [25]:
get_factors4(10)
# The print results look okay. But check your factorsError3.txt file!!!

1 is a factor of 10.
2 is a factor of 10.
5 is a factor of 10.
10 is a factor of 10.


***
### Pay attention to your indentations in Python!!!
***