# Day 1: Welcome to TXT Group with Python

Our goals for this lesson are as follows:
* Introduce the a few key variable types in Python: strings, integers, and lists
* Show how to manipulate strings through concatination ("adding" strings), calculating length with len(), and the .format() method
* Introduce conditional logic: if/elif/else
* Use if statements to search for substrings

Today we will be introducing Python as a multipurpose tool for working with text. Python is a great language for jumping right into manipulating data, and using some of the creative techniques of programmatic thinking along the way.

Our goal today is to show how you can use a programming language to construct and deconstruct (or **parse**) search queries.

**Querying** is a method of requesting particular data from a larger collection or database, according to a set of conditions or **parameters** that are meaningful to us. Search queries are the underlying technique behind search bars of all kinds, from [library catalog searches](http://newcatalog.library.cornell.edu/) to Google's [new dataset search tool](https://toolbox.google.com/datasetsearch). Queries also have a special use case in fetching data from a public API, which you can learn more about in this [workshop guide](https://github.com/cornell-colab/fetching-data-from-apis).



## Getting started with Jupyter Notebooks

Today we will be using a Jupyter Notebook. 

* Create an account on [Azure](https://notebooks.azure.com/)
* Download this notebook as a [.ipynb file](https://raw.githubusercontent.com/cornell-colab/txt-group/master/Day1Notebook.ipynb)
* Upload the notebook to Azure (be sure to seleced Python 3 when prompted)

## Why querying?

Whether you are a student, researcher, librarian, or developer, you are already in the habit of seeking out data to answer questions that matter to you. This might look like: finding a specific e-mail in your inbox, typing search terms into a search engine, looking up historical materials in an archive, entering an address into a mapping app, searching for key words within a journal article, and so on.

Note that in each of these examples, we could type our search terms or our query into a search box, which is often visually represented on a web page, toolbar, or phone app interface.

Each of these queries usually results in one of the following: a specific point of data (such as the latitude and longitude of a restaurant), a sub-section of a longer document (like sentences in a journal article that match your search term), or a set of documents within a much larger collection (like a list of 900,000 search results from the hundreds of trillions of web pages indexed by Google.) We might imagine that, behind the scenes, there is some process that goes into filtering and selecting a giant collection of data, perhaps structure into a database or many databases, and returning just the unique data that matches our search query conditions.


Search engines provide a visual iunterface for searching: the search box. But there are other times we may want to compose or alter queries directly. One important use case is a **Public API endpoint**.

## Example: Chronicling America

[Chronicling America: Historic American Newspapers](https://chroniclingamerica.loc.gov/)

[API Documentation](https://chroniclingamerica.loc.gov/about/api/)

https://chroniclingamerica.loc.gov/search/titles/results/?terms=ithaca

https://chroniclingamerica.loc.gov/about/api/



## Manipulating strings with Python

Python is an excellent multipurpose tool for working with texts, and gives us a new set of algorithmic/programmatic methods to work with. Today we are going to focus on **varaible types** (in particular **strings**), a few **string-related functions**, and **conditional statements**. 

But first! What's a variable?

A **variable** is an object in Python that stores a value we may wish to use later. We use the equals (=) sign to assign a variable a value.

Here are some examples of variables. Click into the box below and press the **run cells** key (either Shift+Enter or the little play button in the notebook header) to assign the variables their values.

In [None]:
how_many_articles = 5
name_of_headline = 'Players and Fans See Sexism in Serena Williams’s Treatment at U.S. Open'
on_front_page = True



What happened? In each line of code, Python (1) created a new variable, (2) gave the variable a name ("how_many_cats"), (3) based on our assigned value, inferred the variable typed (in this case, an **integer**) and (4) assigned this specific integer (5) to the variable.

We saw three different types of variables in this example:
* 5 is an **integer**. Integers store whole numbers.
* 'The Sweet One' is a **string**. Strings store a collection of alphanumeric characters. (Technically, they store these characters in an ordered **list**, which we will get to soon).
* False is a **boolean**. Booleans can be either True or False -- their simplicity allows us to do some neat things, especially with conditional statements.



We can test to see that things behaved as we expected by using the **print() function**. A function is a special, modular piece of code that produces some kind of output, often based on whatever information we put inside of the parentheses. (This information is called a **parameter** or **parameters**.)

In [None]:
print(how_many_articles)

In [None]:
print(name_of_headline)

In [None]:
print(on_front_page)

We can also reassign variables. Variables can hold whatever value we'd like, though generally they insist we keep to the same variable type.

In [None]:
name_of_article = "This Dublin Block Tells the Story of the City"
on_front_page = False

In [None]:
print(name_of_article)

In [None]:
print(on_front_page)

Okay, so far so good! Let's try that name assignment again.

In [None]:
name_of_article = Viral Videos Are Replacing Pricey Political Ads, and They Work

What happened here? We encountered our first error. As intimidating as errors might look, they are valuable sources of information for us when writing code. Here, we see there is something called **invlid syntax**. Any clue of what could be going on here?

It turns out that strings have an important condition: they must be surrounded by quotation marks (single, double, or triple all work, as long as we're consistent). If we don't have quotation marks, Python tries to find a variable with that (long) name: Viral Videos Are Replacing Pricey Political Ads, and They Work. Not only does this variable not exist, but its spaces violate one of the naming syntax rules in Python: variable names can't have spaces in them
Let's fix our work:

In [None]:
name_of_article = "Viral Videos Are Replacing Pricey Political Ads, and They Work"

In [None]:
print(name_of_article)

Today we're going to focus on strings like the one our name_of_cat variable is currently storing. We'll get into some of the unique properties of strings in Python. One of which is **concatenation**. 

What this means is that two strings can be "added" together, or combined back to back to form a new, longer string. And we can do this with variables holding strings, too.

In [None]:
# To run this code, click into this box and then click the Run button at the top of the screen

researchQ = "???"
print("A key question I'm curous about is: " + researchQ)

# Now try assigning a different value to researchQ. Run the code again to see a different answer

## Composing queries from (sub)strings and variables

Let's return to our querying example above:

https://chroniclingamerica.loc.gov/search/titles/results/?terms=ithaca

Another way to represent this would be as a string stored in a variable:

In [None]:
newspaper_query = "https://chroniclingamerica.loc.gov/search/titles/results/?terms=ithaca"
print(newspaper_query)

Let's say, instead of doing this all in one go, we want to actually compose a query from two separate variables: **base** and **parameters**:

In [None]:
base = "https://chroniclingamerica.loc.gov/search/titles/results/?"
parameters = "terms=ann+arbor"

Below, write an expression that will concatenate these two variables and store the result in a variable called **newspaper_title_query**

In [None]:
## Write your expression here

Run the code below to make sure your expression worked:

In [None]:
print(newspaper_title_query)

Great! Now try reassigning the parameters variable to something else in the space below, and then rerun the two cells above. (Feel free to play around with this a bit and paste it into your browser's address bar). 

In [None]:
# Write your code here

## Inspecting a complex query with conditional statements

In [None]:
## To write: look at a long, complex query