## Great!
Now that we have Jupyter Notebook installed through Anaconda (if you haven't done this, see `01_intro_jupyter_notebook.ipynb`), we can go over some Python fundamentals and specific data structures that are commonly used by the lab.

I will be labeling each topic with markdown cells, so if you are following along at home and already know Python/numpy/etc basics, you can skip that section.

## Python: the absolute basics

Python is an object-oriented programming language favored by many computer science students, intro-level programmers, academics, and data scientists. It is available on basically every computer ever. If you have literally zero experience with any programming languages, I would suggest doing some Python tutorials. When I was an undergraduate, I used [Learn Python the Hard Way](https://learncodethehardway.org/python/) but it's not free anymore :( ... so I would look for a course on CodeCademy or Udemy instead. I'll go over the basics below, but I am not an experienced code teacher, so I would really suggest looking for a course or something if you have no experience at all.

#### Comments
Essentially, Python can be thought of as a really, really fancy calculator. Like a graphing calculator but even more cool. The first thing we are going to learn (because this makes this notebook the easiest to write) are **comments**. Comments, denoted with `#`, make it so everything after the `#` won't run until you start a new line. This allows you to put simple English in between your lines of code to help other people read it. This is a Markdown cell, but the below cells in this section will be code cells with comments. To understand the code, read the comments and they will help!

In [1]:
# This is a comment :)
# Ok so like I said, Python is just a fancy calculator. It can do basic calculator stuff like this:
1 + 1 # This line is our input and our output will be displayed below

2

#### Variables

In [2]:
# Pretty sick, right? You can do math with Python.
# You can also assign variables and then do math. Let's set the variable x to 1.
x = 1
x + x

2

In [3]:
# Variables can have any name you want, and you can change the value of a variable many times.
# You can also assign variables a value using already-defined variables.
this_is_a_random_variable_name = x + x # = 2
this_is_a_random_variable_name + x

3

In [4]:
## Variables can also be reassigned. 
old_x = x
x = 7
x + old_x

8

#### Common classes of data structures
Turns out you can do more than just adding numbers in Python. Python can manipulate many different types of data. Let's look at some.

In [5]:
# First off, to check the class of data, you can use type().
type(9)
# It can be important to know the class of your data, because some code will only work if you give it the right class of data.

int

In [6]:
# INTEGERS (int) and FLOATS
# As demonstrated above, whole numbers are integers.
# Not much else to say here. But, if you divide an integer into a non-whole number, it becomes a FLOAT.
type(9/2)

float

In [7]:
# INTEGERS and FLOATS can be added/subtracted/whatever'd together too.
x = 9
y = 4.5
x/y

2.0

In [8]:
# If you want to switch between INTEGER and FLOAT, you can do that using parentheses.
int(x/y)

2

In [9]:
# STRINGS (str) are denoted using quotes, and cannot be combined with INTEGERS and FLOATS.
# This code will cause an Error
"this is a string" + 5

TypeError: can only concatenate str (not "int") to str

In [10]:
# However, you can add STRINGS together!
"this is" + " a string"

'this is a string'

In [11]:
# You can also format strings, which is a very handy tool. This is accomplished through "f strings"
# I'm gonna gloss over it. You can read it more here: https://realpython.com/python-f-strings/
f"One plus one is {1+1}."

'One plus one is 2.'

In [12]:
# LISTS are special. You can make any other class of data into a LIST by putting multiple items inside brackets [],
# and then separating then with commas.
x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
x

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [13]:
# Once we have a LIST, we can do some cool things, such as indexing.
# Let's say we wanted to know the first item in our LIST "x"
x[0]
# Wait, why didn't you do x[1]? This is because Python is "zero-indexed." The first item is 0, not 1.
# This differs between programming languages and is kind of a headache... just a quirk to learn I guess :)

1

In [14]:
# You can get the last item in a LIST using a negative index
x[-1]

9

In [15]:
# Or a range of items by separating the start and end indices with a colon
x[2:5]

[3, 4, 5]

In [16]:
# LISTS can also be added together through a process called "concatenation"
[1.5,2.5,3.5] + [4.5,5.5,6.5]

[1.5, 2.5, 3.5, 4.5, 5.5, 6.5]

In [17]:
# We can learn how many items are in a LIST using len(LIST)
len(["apples","bananas","strawberries","peaches"])

4

In [18]:
# DICTIONARIES (or dict) are basically fancier LISTS. DICTIONARIES are comprised of "keys" and "values" and use curly braces {}
# Here's how it works, you separate items with commas just like LISTS, but you separate keys and values with colon :
example_dict = {"key1":1, "key2":2, "key3":3}

# You can then return an individual value from a DICTIONARY by putting the key in brackets []
# kinda like how you would index a LIST!
example_dict["key2"]

# Also just like lists, DICTIONARIES can be concatenated using +.

2

In [19]:
# If you forget what the keys in your DICTIONARY is, you can find out like this:
example_dict.keys()

dict_keys(['key1', 'key2', 'key3'])

In [20]:
# Same for values.
example_dict.values()

dict_values([1, 2, 3])

#### Functions

**Functions** are a way to condense how many lines you need to write code. Remember `type()` from the last segment? That was a function! All functions are a word followed by open and closed parentheses, such as `print()`. Some functions can have variables "passed" to them by including them in the parentheses, but some don't need variables. (Also the name for a passed variable is an **argument**.)

In [21]:
# Writing your own functions is a great way to make readable, easy-to-understand code.
# However, some functions are included with the "base" of Python, such as print().
# Print takes one argument: what you want it to print. This can be ints, strings, floats, lists...or anything really.
# Print is a very simple function but one you will learn to love as it helps you figure out where your code isn't working.
print("Hello, world!")

Hello, world!


In [22]:
# You may notice there is no Out[] before the output of the last cell.
# That is because outputs are suppressed when print is used.
# For example, we can take our 1 + 1 code from earlier, but if we print after it, we won't see "2."
1 + 1
print("Hello, world!")

Hello, world!


In [23]:
# If we wanted to see both, we could print() both.
print(1 + 1)
print("Hello, world!")

2
Hello, world!


In [24]:
# The output of functions can also be assigned to a variable.
x = print("Hello, world!")
x

Hello, world!


In [25]:
# We can also write our own functions using def:
def custom_function(x): # x is an argument the user will pass, in the same way we tell print() what to print
    # This next section, started and stopped by three apostrophes, is the description of our function. It's like a comment kinda
    '''
    Takes a number and adds one to it.
    '''
    x_added = x + 1 # everything inside our function definition must be indented, similar to loops
    return x_added # return tells us when the function is over

# Now let's use our function
custom_function(7)

8

In [26]:
# The description (or "docstring" as nerds call it) for a function can be called by the function name followed by a ?
custom_function?
# NOTE: this doesn't output anything unless you are running the notebook so if you're viewing this in browser it will be blank

In [28]:
# Some docstrings have a lot, lot more information than the one that I just wrote:
print?

#### Loops
Loops allow you to iterate through data step-by-step. If you want to perform a simple action over and over again, then a loop is what you want!

In [29]:
# The most common type of loop is a "for loop." You iterate through each item in a list or some other structure.
for number in range(10): # the loop starts with a colon
    print(number) # Everything "inside" the loop is indented. The loop ends when you outdent.

0
1
2
3
4
5
6
7
8
9


In [30]:
# Another example
for letter in "Hello, world!":
    print(letter)

H
e
l
l
o
,
 
w
o
r
l
d
!


In [31]:
# If you want both the value and the index, you can use enumerate().
for index, letter in enumerate("Hello, world!"):
    print(index, letter)

0 H
1 e
2 l
3 l
4 o
5 ,
6  
7 w
8 o
9 r
10 l
11 d
12 !


#### Booleans
Boolean logic is separating things into True and False. What if you only want part of your code to run when certain things are happening? That's where booleans come in.

In [32]:
## We can check booleans using "if statements." The syntax for these is similar to the for loops discussed above.
if 1 + 1 == 2: # a double equals sign means "is"
    print("Yep, that's right.")
    
if 1 + 1 == 3:
    print("This isn't right, so this part won't print.")
    
if 1 + 1 != 3: # exclamation-equals means "is not"
    print("Yeah, this is right too.")

Yep, that's right.
Yeah, this is right too.


In [33]:
# Booleans are actually their own class in Python!
print(type(True))

x = True

if x == True:
    print("X is true because we made it true.")
else: # "else" means if the if statement is False, then do this instead.
    print("Somehow, x became not true...")

<class 'bool'>
X is true because we made it true.


In [34]:
# Let's combine a loop with an if statement!

for letter in "Hello, world!":
    if letter == 'l':
        print(f"This is my favorite letter: {letter}")

This is my favorite letter: l
This is my favorite letter: l
This is my favorite letter: l


In [36]:
# You can also do something similar in one line using a list comprehension
# List comprehensions are really handy tools that can make code more readable
print([f"This is my favorite letter: {letter}" for letter in "Hello, world!" if letter == "l"])

['This is my favorite letter: l', 'This is my favorite letter: l', 'This is my favorite letter: l']


#### Imports

In [37]:
# So if you had to write each function you wanted to use that would suck. The solution to this is to use imports
# Imports let you add other people's functions that they wrote into your notebook. To do this, you have to import a package.
# Let's use the package "os." It contains a lot of functions that tell Python things about your operating system
import os
# Now we can use functions from the os package
os.getcwd() # This function tells us the directory our code is running in
# When using functions from a package, you write the package name, then a period, then the function you want from that pkg.

'F:\\git\\lab_intro_notebooks'

In [38]:
# We can also import packages as variables! Wow, variables do so much how cool wowwwowowowow
import os as turtle # this is kinda like turtle = os
turtle.getcwd()

'F:\\git\\lab_intro_notebooks'

## Numpy: a data scientist's best friend

Numpy is a package that data scientists commonly import. Why? Cause it does everything! In a nutshell, Numpy adds a bunch of high-level mathematical functions to Python. Most importantly, matrices, which numpy calls an `array`.

In [39]:
import numpy as np # commonly imported as np so it's easier to write
# Here are some ways to make an array.
x = np.array([1,2,3]) # kinda like a list, right? you can index them in the same way as a list :)
print(x)
# Here's another way, the arange() function
x = np.arange(10)
print(x)
# Another, the zeros() function
x = np.zeros((1,10)) # the argument in parentheses is number of rows by number of columns
print(x)
# np.linspace() also creates arrays but has more arguments: start, stop, and number of steps
x = np.linspace(1,5,20)
print(x)

[1 2 3]
[0 1 2 3 4 5 6 7 8 9]
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
[1.         1.21052632 1.42105263 1.63157895 1.84210526 2.05263158
 2.26315789 2.47368421 2.68421053 2.89473684 3.10526316 3.31578947
 3.52631579 3.73684211 3.94736842 4.15789474 4.36842105 4.57894737
 4.78947368 5.        ]


In [40]:
# Arrays can also be multidimensional. The arrays in the last cell were 1D. let's make a 2D array.
two_dimensions = np.zeros((5,5))
print(two_dimensions)

# What about 3D?
three_dimensions = np.zeros((5,5,5))
print(three_dimensions)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
[[[0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]]]


In [41]:
# You can actually make an array any number of dimensions. This can get confusing really quickly as very quickly,
# multidimensional arrays become hard to read using print(). Luckily, there's another tool that can help with this!
print(np.shape(two_dimensions)) # a 5x5 square
print(np.shape(three_dimensions)) # a 5x5x5 cube

# you can index the shape also
print(np.shape(two_dimensions[0]))

(5, 5)
(5, 5, 5)
(5,)


In [42]:
# With arrays though, we don't call it indexing. We call it slicing.
# That's because it takes place in multiple dimensions (axes), separated by commas
print(three_dimensions[0,1,4])

# If we want all the values in an axis, use a colon
print(three_dimensions[1,:,2])

0.0
[0. 0. 0. 0. 0.]


In [43]:
# You may remember we can concatenate lists and dicts using +. It's different for arrays
x = np.array([1,2,3])
y = np.array([4,5,6])

print(x + y)

print(np.concatenate((x,y)))

[5 7 9]
[1 2 3 4 5 6]


In [44]:
# For higher dimensional arrays it can get complicated
z = np.array((x,y))
print(z.shape)
# Arrays need the same shape to concatenate (this will cause an error)
print(np.concatenate((z,x)))

(2, 3)


ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

In [45]:
# To get around this, you can use np.hstack() or np.vstack()
print(np.vstack((z,x)).shape) # adds the 1D array X as a new column to the 2D array Z

(3, 3)


In [46]:
# There are lots of other ways to manipulate arrays.
# If it's an operation you'd do to a matrix in calculus class, there's prob a numpy function for it.
# I'll lastly cover transposition here. To switch the rows and columns of an array, use .T
print(z.shape)
print(z.T.shape)
# The more you use numpy, the more you'll find cool handy things to do with it!

(2, 3)
(3, 2)


## Reading .csv's
Another common data structure used by the lab, comma-separated (and tab-separated) values can be exported by Excel so are a popular structure of data for large datasets. There's a module for this as well, `csv`.

The name of the game here is to get the .csv into some type of more common structure for Python, most commonly a numpy array or a list. I'll turn a .csv I found on Google into a list in this example.

In [47]:
import csv
example_csv_filepath = f"{os.getcwd()}/supplemental_files/addresses.csv"

csv_list = [] # Make an empty list we will append to
with open(example_csv_filepath) as csvfile:
    reader = csv.reader(csvfile) # save the csv file to a custom data class supported by the csv module
    for row in reader:
        csv_list.append(row)
        
print(csv_list)

[['John', 'Doe', '120 jefferson st.', 'Riverside', ' NJ', ' 08075'], ['Jack', 'McGinnis', '220 hobo Av.', 'Phila', ' PA', '09119'], ['John "Da Man"', 'Repici', '120 Jefferson St.', 'Riverside', ' NJ', '08075'], ['Stephen', 'Tyler', '7452 Terrace "At the Plaza" road', 'SomeTown', 'SD', ' 91234'], ['', 'Blankman', '', 'SomeTown', ' SD', ' 00298'], ['Joan "the bone", Anne', 'Jet', '9th, at Terrace plc', 'Desert City', 'CO', '00123']]


## Reading TextGrids
TextGrids require some custom script to load in. I put this script in the `supplemental_files` folder and we are going to import it from there. Similarly, let's convert an example TextGrid to a list. This one's not from the internet though, it's from my Masters thesis experiment :)

In [48]:
from supplemental_files import textgrid
example_tg_filepath = f"{os.getcwd()}/supplemental_files/OP0001_B1_spkr_02.textgrid"
with open(example_tg_filepath) as tgfile:
    tg = textgrid.TextGrid(tgfile.read())
# Now we can view the tiers using textgrid.tiers
print(tg.tiers)

[<IntervalTier "phone" (0.01, 605.00) 100.00%>, <IntervalTier "word" (0.01, 605.00) 100.00%>]


In [49]:
# Using the tiers, we can get transcriptions
tg.tiers[0].simple_transcript # get a transcript of the 0th tier

[('0.012471655328798186', '0.2619047619047619', 'sp'),
 ('0.2619047619047619', '4.192970621315193', 'ns'),
 ('4.192970521541951', '4.831519274376418', 'M'),
 ('4.831519274376418', '4.971201814058957', 'AH1'),
 ('4.971201814058957', '5.160770975056689', 'M'),
 ('5.160770975056689', '5.220634920634921', 'S'),
 ('5.220634920634921', '5.270521541950113', 'T'),
 ('5.270521541950113', '5.310430839002267', 'R'),
 ('5.310430839002267', '5.390249433106575', 'AO1'),
 ('5.390249433106575', '5.509977324263038', 'NG'),
 ('5.509977324263038', '5.539909297052154', 'L'),
 ('5.539909297052154', '5.6197278911564625', 'IY0'),
 ('5.6197278911564625', '5.6696145124716555', 'D'),
 ('5.6696145124716555', '5.729478458049887', 'IH2'),
 ('5.729478458049887', '5.809297052154195', 'S'),
 ('5.809297052154195', '5.869160997732426', 'L'),
 ('5.869160997732426', '5.9589569160997735', 'AY1'),
 ('5.9589569160997735', '5.998866213151928', 'K'),
 ('5.998866213151928', '6.0587301587301585', 'S'),
 ('6.0587301587301585', '

In [50]:
# And we can save it to a list, simple as that!
phoneme_transcript = tg.tiers[0].simple_transcript

Okay, that was a lot of information to cover! Please try to work through this before our meeting, preferrably in Jupyter Notebook so you can mess around with the variables and functions yourselves. And as always, please reach out through Slack if you are having any problems!

-Garret Kurteff, Feb 2021