Welcome to the Google Colab programming environment. This is a virtual environment hosted where you can run Python programs without installing a bunch of new stuff to your computer.  

We're using a Jupyter Notebook (file extension .ipynb): this is an interactive visual interface for running code.

In between the text blocks like this one, you'll see empty "cells" formatted as input lines. Enter Python code in those cells, following the instructions in the text blocks above them. To run the code you wrote, click the "Run" button to the left of the cell or type Shift + Return.


Try it now!  Type this phrase in the code cell below.  


    print('Hello world!')


If you've never written computer code before, now you have!  You just told Python to print out the value inside the parentheses. In this case, that value is the string 'Hello world'.  Anything within quotation marks is parsed by Python as a string. Python also recognizes numerical values, such as integers (whole numbers) and 'floats' (decimal numbers).  Try this, and see if you can predict what will happen:

    print(1 + 3)

    print('1 + 3')


Value types and variables
---

If you ever have a quesiton about the what kind of value you're working with, you can check it with the *type()* function. Try this:

    print(type(3))


In this case, you asked Python to 1) interpret the type of the value within parentheses, and 2) print the result. You're using two nested functions: print() and type().

If you're interested, a list of additional built-in Python functions is available here: https://docs.python.org/3/library/functions.html


Next, try setting a variable:

    a = 'mongoose'

    print(a)

    print(type(a))


Notice that the variable *a* is treated as equivalent to the string 'mongoose'; the type of the variable is 'str,' or string.


There are lots of rules and guidelines for how to name variables in Python, but don't worry about them too much until you're writing your own programs outside this worksheet. For now, just know that in Python the variable name always goes on the left side of the assignment statement (eg x = y), and the value assigned to that variable name goes on the right.

You can reassign and change variables anytime. A new variable assignment overwrites the previous one.

Try this:

    a = 'mongoose'

    a = 'squid'

    print(a)


Python is smart about figuring out what value types it's working with and understanding what it can and can't do with those values. Try:

    a = 'mongoose'

    b = 1

    c = '2'


    print(a + b)
    

In the code cell above, you should see a "Traceback" error message - Python is letting you know that something didn't work in your code, and explaining why it broke down. In this case, you tried to add two unlike values -- the value for c, '2', is a string, not an integer -- it's in quotation marks. But you can convert the value type with another operation:


  print(b + (int(c)))

Here you converted the value of c, '2', from a string into an integer with the int() operation, and then b and c could be added together.

Note that you have a number of nested operations in this line above: printing, addition, and value type conversion. This makes for a lot of close-parentheses on the end of the line, but that's the only way to let Python know your request is complete.

Interestingly enough, Python can add strings together, too.  This is called string concatenation, and it uses the + sign. Try this:

  print(a + c)

Notice there's no space added between the original strings - they're merged into a single string.  To print both values with a space between them, try:

    print(a, c)

Note that the values for a, b, and c have remained consistent since you first set them, but you can change the value of a variable.

Try this:

    b = b + 1

    b = b - 5

    print(b)

It can be very handy to assign variables to keep count of things for you.  We'll come back to this later.


Boolean values
--

Python also recognizes boolean operators and logical statements.

When asking Python to assess the truth value (ie True or False) of a statement, use TWO equals signs: == .  
That's how you let Python know you want to test a truth statement, not assign an new variable.

Try this, and remember that the value of *b* is the same as it was in the last code cell.

    b == 10


Now type and run this, preserving the indentations and punctuation:

    if b < 0:  

      print('b is less than zero')    
    
      print('b is a negative number')  

If statements
--

Note the syntax of the **if statement** above.  Start with 'if' in lower case, then give the expression whose truth value you're testing (in this case, b<0), and finish with a colon.  Then skip a line and indent and give the operation you want Python to run if the if-statement is true.  The indent lets you see at a glance that the indented operation will only happen if some other condition is met.

You can write longer if-statements using the terms **elif** (a contraction of else-if) and **else**. Here's an example:  

    z = 99  

    if z > 100:  

      print("Big number.")  
    
    elif z < 75:  

      print("Ok.")  
    
    else:  

      print("Middling.")

Feel free to play around with your own variations on these suggestions so far.  Your Jupyter Notebook cells will run any valid Python code.  

The Input Function
--

Fun fact: you can write a simple text-based choose-your-own-adventure game in Python using conditional statements, the print function, and the **input function**.

The input function asks for typed input from a user and incorporates that input into the program. Assign the input to a variable, and use that variable in your if-statements. Note that input values always start out as strings, so you'll need to convert them to integer or float val you want to treat them as numbers. Try this:

    x = input('Pick a number.')  

    if int(x) > 3:  

      print("Too high.")  
   
    elif int(x) < 3:  

      print("Too low.")
        
    else:  

      print("That's a magic number!")


Strings
--

A string in Python is a **sequence** of characters enclosed by single or double quotation marks. A sequence is an excellent and elegant way to store data. Here are some examples of how you can treat a character string in Python as sequential data.

Try this:

    a = "waterlily"  

    print(len(a))

len() is the length function. You just asked Python to tell you the number of items - i.e., characters - in the sequence that is a.

Python will index the items in a sequence by number, starting with zero. That is to say, the first item in a sequence is indexed at 0. In the case of of the string 'waterlily', the item at index [0] is 'w'. Use square brackets to enclose the index number.

Try these examples:

    print(a[0])  

    print(a[4])  

    print(a[-1])  


Note that using an index of [-1] can be a convenient way to find the last item in a sequence.


You can use a variation of indexing to return a slice of items in a sequence; this is called... slicing. Give Python an index range with endpoints separated by a colon; Python will begin at the first term, and return items up to but not including the second term. If you leave one side blank, Python will automatically use the first or last item of the sequence.

    print(a[2:5])  

    print(a[:6])  

    print(a[3:])

There are many useful built-in methods ([methods are functions that "belong" to an object](https://www.w3schools.com/python/gloss_python_object_methods.asp)) that you can use to work with strings! You use methods by attaching them to appropriately-typed objects with a period. For example, if you want to **replace** every instance of one character in a string with a different character, there's a method for that:

```
sentence = "This_is_a_sentence_but_for_some_reason_it_has_underscores,_not_spaces."

sentence = sentence.replace("_", " ")

print(sentence)
```

The **replace** method requires two arguments: first, the character to be replaced, and second, the character to replace it with.

There's also a built-in method to search strings for the index of a sub-string (a shorter string contained inside). What if you had a big block of text and wanted to **find** something specific?

```
sentence = "Once upon a time there was a little puppy named Lainey."

puppy = sentence.find("Lainey")

print(puppy)
```

The **find** method only requires one argument: the sub-string to search a larger string for. See what happens if you try to find a sub-string that doesn't appear inside the larger string!

Or what if you wanted to **split** a string into its component parts? As you've probably guessed by now, there's a method for that too:

```
sentence = "Sentences are just lists of words!"

words = sentence.split(" ")

print(words)
```

The **split** method takes one argument: the character that you want to use to split the string. It returns a *list* containing each sub-string that appears between the character in question. More on lists very soon!

# Lists

*Lists* are one of the most common and useful Python data structures around. They're exactly what they sound like: a list of *elements* that can be of any data type (and don't all have to be of the same type). Each element in a list has and index, and lists are zero-indexed like strings so the first element in a list has an index of 0. Lists are created by enclosing elements in square brackets and separating them with commas:

```
my_first_list = [1, 2, 3]

print(my_first_list)
```

How would you print out only the second element in this list?

You can use the **append** method to add an element to the list:

```
my_first_list.append(4)

print(my_first_list)
```

Or the **pop** method to remove a specific element, specified by its index, from the list:

```
my_first_list.pop(2)

print(my_first_list)
```

What do you think will be the value of the element at index 2 in this list after you enter and run all four of the lines above?

If you have a long list and are looking for the **index** of a specific element, you can do that too:

```
numbers = [x for x in range(100)]

print(numbers.index(45))
```

The way we created the list here uses an intermediate technique called a [list comprehension](https://www.w3schools.com/python/python_lists_comprehension.asp). See what happens if you try to use **index** to find the position of an element that isn't actuallly in the list!

# Dictionaries

If you have data that needs just a bit more context than a list can provide, you can use another extremely useful Python data structure called a *dictionary*. Dictionaries allow you to pair pieces of information, conventionally known in this context as *values*, with labels that describe them, conventionally known in this contexts as *keys*. Dictionaries are created by enclosing key-value pairs with curly brackets. Once you've created a dictionary, you can access the value of specific keys using index notation. For example:

```
my_pet = {"name": "Lainey", "species": "dog", "age": 0, "friendly": True}

print(my_pet["name"])
```

Best of all, dictionaries and lists can work together! You could have a list of dictionaries or (less frequently) a dictionary of lists:

```
my_other_pet = {"name": "Tyrion", "species": "dog", "age": 9, "friendly": "kind of"}

my_pets = [my_pet, my_other_pet]

print(my_pets[0], my_pets[1])
```



The true power of Python starts to become clear when you learn how to combine different data structures with things like if statements or *loops* to accomplish increasingly complicated tasks. Speaking of loops...

For loops
--

Python can also iterate through items in a sequence and perform operations on them using an amazing logical structure called a *for loop*.  

The basic syntax of the for loop is simple. You start with a for statement and a colon, then skip to a new line, indent, and outline the action you want to happen with each item in the sequence.

    for [item] in [sequence]:
      *do something*

In this example, [item] is a variable name for the items in this sequence. Each item in the sequence will be assigned to this variable in turn. This kind of variable is called an iterator variable.  

Note that you need to define the sequence before writing the for loop, but the iterator variable is defined for the first time within the for loop.

Try this example:

    b = "bananas"  

    for letter in b:  

      print(letter)

You can make this more complex, more silly, or both.  Try this version:
    
    b = "bananas"  

    for letter in b:  

      print(letter + 'aa')

For loops are a common way to iterate through the items of a sequence and count them. For this next operation, we'll use a counter variable to count the number of times the letter 'a' appears in the word 'bananas'.

*word* is our variable name for the character sequence 'bananas'
*letter* is the iterator variable
*a_count* is the counter variable; we'll add 1 to this for each 'a' the iterator encounters as it loops through the letters in *word*.

Try this code:

    word = 'bananas'  

    a_count = 0  

    for letter in word:  

      if letter == 'a':  
    
        a_count += 1
        
    print('the letter a appears', a_count, 'times in the word', word)

# While loops

If you'd like for a loop to execute until a certain condition is met rather than for all of the items in a sequence, you can use a *while loop* instead. The basic syntax is similar:

```
while *some condition is true*:
  *do something*
```

The indented block of code will run repeatedly until the condition in the loop declaration is no longer true. For example:

```
treats_lainey_has_eaten = 0

while treats_lainey_has_eaten < 10:
  print("Lainey ate another treat!")
  treats_lainey_has_eaten += 1
```

It's particularly easy to accidentally write a loop that will never end with a while loop! These kinds of loops, known as infinite loops, will run repeatedly until you tell your Python interpreter to stop:

```
iterations = 0
stop = False

while not stop:
  iterations += 1
  print("The loop has executed", str(iterations), " times.")
  #if iterations == 10:
    #stop = True
```

Now remove the pound signs from the two lines above and run the cell again. You can (and should) use pound signs to add comments to your code as you write it, but be careful not to put them in front of lines that you actually need to run.

This code snippet introduces two other new concepts:

1.   The *not* operator, which inverts a boolean value
2.   The *str* function, which converts a value to a string - if you think back to one of our first exercises, we used the similar *int* function to convert a value into an integer

As with lists and dictionaries above, there's no reason that you can't combine different kinds of loops:

```
months = [{"name": "august", "length": 31}, {"name": "september", "length": 30}, {"name":"october", "length": 31}]

for month in months:
  day = 0
  while day < month["length"]:
    day += 1
    print(month["name"], day)    
```

Let's break down what's happening here. First, our for loop is iterating across the three elements in our list, each of which is a dictionary describing a month of the year with a couple of key-value pairs in it.

On each iteration of the for loop, our nested while loop will run and print out the days of each month until it reaches the "end" of that month, as defined by its "length" key.

Can you modify the code to include another month?

What if we only wanted to print out even months? There's many different ways to achieve that, but one makes use of the *continue* keyword. Any time our Python interpreter hits a continue inside of a loop, it will immediately move on to the next iteration of that loop. For example:

```
months = [{"name": "august", "length": 31}, {"name": "september", "length": 30}, {"name":"october", "length": 31}]

for month in months:
  day = 0
  while day < month["length"]:
    day += 1
    if day % 2 != 0:      
      continue
    print(month["name"], day)    
```

We've substantially changed how this code behaves simply by adding an if statement! That if statement uses a couple of new concepts to allow us to skip odd days of the month:

1.   The modulo operator (%), which returns the remainder when the integer on the left is divided by the integer on the right. What's the remainder when an even number is divided by 2?
2.   The not equal to operator (!=), which returns the logical inverse of the equal to operator (==) that we learned about earlier.

Together, these operators give us an if statement that will trigger for all odd days of the month, and when it does, we hit a continue keyword, which immediately skips to the next iteration of the loop with executing any more code.

If, instead of continuing the loop, we wanted to stop it as soon as we hit a certain keyword, the keyword that we would want to use is *break*. Can you modify the loop so that it only prints out the first 10 days of each month?

# Defining functions

If there isn't a built-in function or method that does what you want to do, you can make your own! You're probably sensing the pattern by now, but the basic syntax to define your own function is quite simple:

```
def *your function name*(*your function parameters*):
  *do something*
  return *some value* <- this is optional
```

For example:

```
def add_two_numbers(x, y):
  return x + y

print(add_two_numbers(4, 5))
```

What happens if you change the code above and instead pass one string and one integer to your function as arguments? ("parameters" refer to the input variables that you ask for in your function defintion, and "arguments" are the actual values that you supply when calling the function) This is one of the many reasons why documenting your code is important!

Any parameter like x and y in the code snippet above that appear in a function definition without a default value are required. You can also include additional parameters with default values that then become optional:

```
def add_two_or_three_numbers(x, y, z = 0):
  return x + y + z

print(add_two_or_three_numbers(4, 5))
print(add_two_or_three_numbers(4, 5, z = 6))
```

Note that you have to explicitly reference the name given to optional parameters in the function definition in order to supply them.

But why would you want to go through the trouble of defining functions rather than simply writing the code in the middle of your script? There are many reasons! Some of the important ones include avoiding the duplication of code, making your code more legible, or making it easier to share your code. Can you think of others?

# Importing packages

Python has a ton of functionality built in, and the ability to write your own functions extends the range of things you can do even further, but what if someone else has written some code for a common use case that you'd like to be able to use? Luckily, this is easy to do! And because of the wide adoption of python, [**packages** exist for just about anything that you're interested in doing](https://pypi.org/).

The [pip package manager](https://pypi.org/project/pip/) is a tool that is commonly used to install and manage Python packages. If/when you're writing Python outside of a Jupyter Notebook, you'll typically interact with pip on the command line, but here we can interact with it directly through the Colab interface.

You can do so using this syntax:

```
!pip install *package name*
```

The exclamation mark tells Jupyter that the line of code that follows should be interpreted as a terminal command. Try this:

```
!pip install wordcloud

import wordcloud

words = "these are some words words words to make a word cloud cloud out of"

wc = wordcloud.WordCloud(background_color="white", max_words=5000, contour_width=3, contour_color='steelblue')

wc.generate(words)

wc.to_image()
```

Don't worry if you don't understand all of the syntax in this snippet, the key takeaway is to see how we're installing and importing the wordcloud package.

If you don't want to import all of the functions in a package, you can use this syntax:

```
from *package name* import *function name*
```

For example, this code will work the same as the snippet above:

```
#!pip install wordcloud

from wordcloud import WordCloud

words = "these are some words words words to make a word cloud cloud out of"

wc = WordCloud(background_color="white", max_words=5000, contour_width=3, contour_color='steelblue')

wc.generate(words)

wc.to_image()
```

The first line is commented out because if you already installed wordcloud during this Colab session you won't need to again. Notice how the line where we call the WordCloud function changed. Can you figure out why this is?

Finally, if you want to change (usually shorten) the names of packages or functions that you're importing, you can use the **as** keyword. This code will also work the same as the two previous snippets:

```
# !pip install wordcloud

from wordcloud import WordCloud as wc

words = "these are some words words words to make a word cloud cloud out of"

wc = wc(background_color="white", max_words=5000, contour_width=3, contour_color='steelblue')

wc.generate(words)

wc.to_image()
```

Again, the only change is to the line where we call the WordCloud function. What does "wc" refer to on the left side of the equal sign? What does it refer to on the right side?

# Reading and writing files

Since we're in the business of working with text here, how might we actually go about interacting with a text file? We can do that with the built-in *open* function:

```
with open(*path to your file*, "r") as infile:
  *read the file*
```

The open function requires two arguments: the path to the file that you want to open and a flag telling Python what you want to do with the file. Here we've passed in "r" as our flag to indicate that we want to *R*ead the file. You can also passing in an optional "encoding" argument if you know what form of [character encoding](https://en.wikipedia.org/wiki/Character_encoding) your file uses.

In order to try reading a file, we're going to need a file to read. You can either create your own text file or [download this one](https://drive.google.com/file/d/1vK96xMfgBokhWHjANAGmn1k-WNuso_Bk/view?usp=sharing). Once you have a file ready, click on the folder icon on the left side of the screen and upload the file to the Colab environment. Then try:

```
with open(*path to your file*, "r") as infile:
  for line in infile:
    print(line)
```

Recall the difference between absolute paths and relative paths. It's important to get comfortable navigating the file structure on your computer! For now, though, since you've uploaded the text file directly to your Colab environment, you can simply substitute the name of the file (including the file extension) as a string for "path to your file" in the code snippet.

If everything worked correctly, you should see the lyrics to "Baby Shark" output above. You're welcome.

Last but not least, what if we wanted to write to a file rather than read it? In that case, all that we need to do is change "r" to "w" when we call the open function:

```
path = *path to your file*

with open(path, "w") as outfile:
  outfile.write("Hello world!")

with open(path, "r") as infile:
  for line in infile:
    print(line)
```

Note how we save the path to the file as a variable so that we don't have to write it out twice (and improve the readability of our code).

Our file only contains "Hello world!" now because the open function overwrites all existing file content by default. You can download the updated file to confirm this - be careful not to overwrite important information!

# Conclusion

If you have a solid grasp of the concepts covered in this notebook, you're more than a Python novice - you genuinely have all of the information that you need to do some very complicated things. The challenge (and power) of using Python, or any other programming language, comes in understanding how best to leverage and combine different data structures, functions, and logical constructs to achieve a particular purpose.

Once you're comfortable with the basics, the best way to learn more is to go write some code. Programming knowledge scales exponentially, as each thing you figure out how to do unlocks multiple others in combination with the other things that you already know how to do!