# Getting Started

To get started with the notebook below, please follow the [README](./README.md) guide in this folder. Once you've accomplished the steps in that README, you'll be all set to start learning Python below!

# Introduction to Python

If you're interested in learning a programming language, Python is an excellent choice. It's an ideal language for beginners (and advanced users), as it's powerful, flexible, and equipped with packages that can help you accomplish nearly any task you might wish to solve using a computer. These are only some of the reasons Python recently became the most popular programming language in the world:

<img src='./assets/python-popularity.jpg' style='margin: 10px auto; width: 400px'>

This workshop will guide you through your first steps with Python. We will introduce some of the most important concepts in the language, and will learn how to use Python to accomplish real life research tasks. Let's dive in!

# What Can You Do With Python?

Before spending time studying Python, you might ask yourself: What can I do with Python? The truth is that nearly anything that can be done with a computer can be done with Python. For example, you might use it to:

### Collect Data

Perhaps you would like to study the influence the King James Bible had on *Moby Dick*, or you would like to study patterns in Napoleon's letter writing practices. All digital projects start with data collection, and Python can help you collect or "scrape" data from the internet (always check copyright status!). 

<img style='width: 400px' src='./assets/scraping-data.gif'>
<div style='width: 600px; margin: 10px auto; color: #888; font-family: arial; font-size: 12px;'>Selenium and BeautifulSoup are two of the most popular tools for collecting data from the web, and both are written in Python.</div>

### Analyze Data

Python is an awesome language for data analysis. Maybe you want to study semantic shifts in periodical collections over the course of the twentieth century. Python can help you study such patterns at scale. 

<img style='width: 400px' src='./assets/rap-vocab.png'>
<div style='width: 600px; margin: 10px auto; color: #888; font-family: arial; font-size: 12px;'>"The Largest Vocabulary In Hip Hop", an article on <a href='https://pudding.cool'>https://pudding.cool</a>, uses simple data analysis to identify the vocabulary size of various popular rappers, a task that one could achieve with Python.</div>

### Machine Learning

Python has become one of the most popular languages for conducting machine learning and artificial intelligence (AI) research in both textual and visual domains. 

<img style='width:400px' src='./assets/reinforcement-learning.gif'>
<div style='width: 600px; margin: 10px auto; color: #888; font-family: arial; font-size: 12px;'>Reinforcement learning algorithms&mdash;the engines that allow algorithms to play games like Chess, Go, or Super Mario Brothers&mdash;are one of many areas in machine learning one can study using Python.</div>

### Make Gadgets

Gadget makers use the extrordinarily small and affordable Raspberry Pi microcomputer—which runs on Python—to build electronic devices like Bluetooth enabled speakers and interactive art displays.

<img style='width:400px' src='./assets/treat-dispenser.png'>
<div style='width: 600px; margin: 10px auto; color: #888; font-family: arial; font-size: 12px;'>Tech enthusiasts and cat lovers alike might wish to know that a <a href='https://www.adafruit.com/product/2885'>$5 Raspberry Pi computer</a> runs on Python and can be used to create homemade arcade games, security sytems, and cat treat dispensors such as the one pictured above.</div>

### Create Websites

The Python ecosystem is home to several popular frameworks for building websites, including Flask and Django. Major companies you know and love have use these frameworks in their websites. Flask is used by organizations including Reddit, Netflix, and AirBnB. Django is used by companies including YouTube, Instagram, and Spotify. 

<img style='width:400px' src='./assets/netflix.gif'>
<div style='width: 600px; margin: 10px auto; color: #888; font-family: arial; font-size: 12px;'>Netflix.com is built on Flask, a popular, easy to use server framework written in Python.</div>

# Getting Started

### Variables

Variables are the core building blocks of computer programming. A variable is a sequence of characters to which we assign a value or meaning using the = sign. Let's look at an example:

In [None]:
my_cat = 'cheshire'

Here `my_cat` is the variable, and `'cheshire'` is the value assigned to that variable. In general in Python, the thing on the left hand side of the equals sign is the variable, and the thing on the right hand side is that variable's value.

When creating a variable, keep in mind that **variable names**:
- can only include letters, numbers, and underscores
- must start with a letter or underscore (variables that start with underscores have a special meaning, so for now, we'll start our all of our variables with letters)
- are case sensitive.

Given the above rules, try to anticipate which cells below will return an error message:

In [None]:
catepillar_question = 'Who are you?'

In [None]:
_catepillar_question = 'Who are you?'

In [None]:
catepillar_Question1 = 'Who are you?'

In [None]:
1catepillar_question = 'Who are you?'

Variables can be reassigned in Python. We might assign a value to a variable, then later assign a different value to that variable:

In [None]:
my_cat = 'cheshire'
my_cat = 'grinning'

Here `my_cat` is given an initial value `'cheshire'`, but is then given a new value `'grinning'`. 

### Printing

To check the value assigned to a variable at any given moment, you can use the `print()` function:

In [None]:
my_cat = 'cheshire'
print(my_cat)

The `print()` command is a type of function. We can recognize `print()` as a function because it has parentheses after the name. In general, **functions** are little pieces of code that take **arguments** (the text enclosed in parentheses) as input and do something with those arguments. The `print()` function, for example, takes as its argument a variable like `my_cat`, then displays the value assigned to that variable. We will explore functions in greater depth below. For now, we just want to note that `print()` is a function, which we can recognize by the presence of parentheses after the function's variable name.

See if you can use the `print()` function to tell what value is assigned to the following variable:

In [None]:
# see if you can display the value assigned to `my_motto` below
my_motto = "It's no use going back to yesterday" + ", " + "because I was a different person then."

### Introspection

Just as we can use the `print()` function to see the value assigned to a variable, we can use the `dir()` function to see the kinds of things we can do with a variable. Brace yourself though, the `dir()` command displays lots of options!

In [None]:
title = 'Alice in Wonderland'
dir(title)

Running the `dir()` function returns a list of methods that are defined on the provided variable. In the example above, we can see that `title` has 50 or so methods defined. **Methods** are similar to functions, except they're tied to the variable (or object) you're working with. You can call a method using dot notation. Let's look at a few examples:

In [None]:
title.upper()

In [None]:
title.lower()

We can see that the `upper()` method lets us uppercase our title, and the `lower()` method lets us lowercase our title. We will use more methods in just a moment. For now, we just want to remember that the `dir()` command tells us which methods can be called on variables.

See if you can use the `dir()` command to identify the methods available on the variable below:

In [None]:
# use the `dir()` command below to identify the methods available on the variable below
song = '''
  How doth the little crocodile
  Improve his shining tail,
  And pour the waters of the Nile
  On every golden scale!
'''

### Data Type: Strings 

All of the variables we have seen so far have been examples of "strings", another way of referring to text data. We know these variables are strings because their values are wrapped in quotation marks.

Can you explain the error message we get if we try to assign text data as a value without using quotation marks?

In [None]:
# the code below will result in an error
main_character = Alice

# fix the above code here

With string data, we often need to process it to find specific bits of interest, or to clean extraneous characters. Let's practice processing strings below.

In [None]:
opening = '''fAlice was beginning to get very tired of sitting by her sister on the bank,
and of having nothing to do. Once or twice she had peeped into the book her
sister was reading, but it had no pictures or conversations in it, ‘and what is
the use of a book,’ thought fAlice ‘without pictures or conversation?’'''

The data in the string above looks good, except "Alice" is prefaced with an "f" for some reason. Let's clean that up with the `replace()` method, which replaces all instances of the first argument with the second argument:

In [None]:
opening = opening.replace('fAlice', 'Alice')
print(opening)

### Data Type: Lists

A **list** is a variable that contains one or more items. Just like strings are encased in quotation marks, lists are represented by square backets, with commas separating the different items in the list. Here is an example list:

In [None]:
facts = ['four times five is twelve', 'four times six is thirteen']

Just like strings, lists have their own set of methods we can use. Let's practice the `append()` method, which adds an item to a list:

In [None]:
facts.append('and four times seven is–oh dear!')
print(facts)

See if you can add a new fact to our list of "facts" below:

In [None]:
# add a new fact to our list of "facts" here

# print the new list


To access an item in the list, we use that item's index value. Keep in mind, Python starts counting at 0! So, if we want to print 'four times five is twelve' (which is the first item in our `facts` list), we would type:

In [None]:
print(facts[0])

Try to access 'and four times seven is–oh dear!' from `facts`:

In [None]:
# print 'and four times seven is–oh dear!' from the facts list


Now that we know a bit about lists, let's go back to the opening paragraph of *Alice in Wonderland*, which we've saved in the variable `opening`. Suppose we only want to examine the first sentence in the paragraph. To accomplish this goal, we could use the split() function, which cuts a string into a list of strings based on a user-provided argument.

In [None]:
# the code below will turn the opening paragraph-currently stored as string data-into a list
opening_sentences = opening.split('.')
print(opening_sentences)

In [None]:
# using what you know about lists and index numbers, print the first sentence to Alice on Wonderland here

### Reading Files

Many Python programs need to read files to load data you have collected into memory. Let's practice reading a file:

In [None]:
my_file = open('alice-in-wonderland.txt').read()

You'll notice that the line above starts with `open()`. `open()` is an example of a **built-in** function, which means it comes predefined in all Python files. There are a small handful of these built-in functions, including the `print()` and `dir()` functions we saw above. For now we should note that `open()` returns a "file handler" that contains a `read()` method. Calling that `read()` method lets us read the text content of a file into memory.

Try downloading a plaintext file (i.e. a file with .txt extension) from a site like Project Gutenberg then read that file into memory below:

In [None]:
# try to load a file into memory here


<h3 style='color: green'>Review #1</h3>

By this point, we have covered many of the most important first steps with Python. But the hardest part of learning a programming language is learning how to piece these commands together to accomplish goals. Let's start practicing this skill with the following exercise.

See if you can read the file "alice-ocr.txt" into memory, then clean up the misspelled words in that file. You may need to investigate the file contents to determine which words are misspelled.

In [None]:
# type your code here

<img style='width: 100px' src='./assets/rabbit.jpg'>

### Loops

In our work above, we learned that lists are defined by square brackets, with commas separating each item in the list. We've learned that we can access individual list items using their index position, but oftentimes, it can be useful to automate that process of returning list items one by one(imagine if our list had 10,000 items!). We can accomplish this goal with a `for loop`. 

To see for loops in practice, let's first create a new list and save it in the variable `lines`. Then, we'll run a for loop to iterate through each line in our list.

In [None]:
lines = [
  'How cheerfully he seems to grin',
  'How neatly spread his claws',
  'And welcome little fishes in',
  'With gently smiling jaws!'
]

In [None]:
for line in lines:
  print(line)

A **for loop** examines each item in a list from left to right (or top to bottom if you like). For loops are structured as follows:
  
```
for variable_name in list_name:
  actions_to_run_on_variable_name
```

Within the indented part of the loop (the part of the loop prefaced by whitespace), the value of `variable_name` will be equal to the current item in the list. Let's make this more concrete by discussing the loop we ran above.

In the first pass through the loop, the variable `line` has the value `'How cheerfully he seems to grin'`.   
In the second pass through the loop, the variable `line` has the value `'How neatly spread his claws'`.  
And so on. Each time we move through the loop, `line` takes on the value of the next item in the list.

As we move through the loop, we can perform operations on each item in the list. Let's practice with another list:

In [None]:
lines = [
  'Beautiful Soup, so rich and green',
  'Waiting in a hot tureen!',
  'Who for such dainties would not stoop?',
  'Soup of the evening, beautiful Soup',
]

for line in lines:
  print(line.upper())

### Modules

One of the great things about the Python programming language is the fact that the Python ecosystem has many, many "modules" that will help expedite your work. A **module** is a collection of code that contains some variables or functions you can use. Some of these modules were authored by the creators and maintaners of the language itself, while others were written by other kind souls who have open sourced their code. Let's take a look at the `collections` module, a built-in module that is useful in lots of data processing tasks.

In [None]:
import collections

word_counter = collections.Counter()

The lines above show the general pattern we follow to use modules in Python. First we must `import` the module we wish to use. That `import` line makes it possible for us to use the variables and functions defined in the module in our own code.

After importing the module, we can access the variables and functions defined within that module. For a full list of the variables and functions defined within the `collections` module, one can use the `dir` method we discussed earlier: 

In [None]:
dir(collections)

Now that we've imported the `collections` module and created a `Counter`, let's use that `Counter` to count some words:

In [None]:
poem = '''
  ‘The Queen of Hearts, she made some tarts,
  All on a summer day:
  The Knave of Hearts, he stole those tarts,
  And took them quite away!’
'''

# split the poem into a list of words
words = poem.split()

for word in words:
    word_counter[word] += 1

In [None]:
print(word_counter)

That's a little hard to read, but by using just one more module we can visualize our word counts:

In [None]:
%matplotlib inline

import helpers

helpers.plot_counts(word_counter)

<h3 style='color: green'>Review #2</h3>

You've now made it quite far in your Python journey! Let's review what we discussed above with the following challenge.

See if you can read in `jabberwocky.txt`, count the number of times each word occurs in that file, and plot those counts. As you work to complete this challenge, you may find it helpful to subdivide the task into smaller subtasks and complete those subtasks one by one...

In [None]:
# type your code here


# Going Further

Congratulations! You've covered a lot of ground with the Python programming language. From here, you should be prepared to take a deeper dive into the language. 

For next steps, we recommend browsing the lessons on [Library Carpentry](https://librarycarpentry.org/lc-python-intro/) and [the Programming Historian](https://programminghistorian.org/):

<img src='./assets/programming-historian.png' style='width: 550px; margin: 10px auto'>

In addition to those lessons, feel free to stop by our [open Office Hours](https://dhlab.yale.edu/resources/office-hours.html) to discuss a project you might want to pursue with Python!