# TXT Group Week 2: Listing, Splitting, and Slicing; or, the Things We Do in Brackets

## Introduction
Last session, we went over how to define a __variable__, the different types of data Python accepts, and how to set a conditional in the form of an __if statement__.

To review these quickly, we’re going to define the __variable__ _topic_ as the string “Letters”, _result_ as the string “?”, and _book_index_ as an especially long string. (I promise I'm not _string_-ing you along here)

In [6]:
topic = "Letters"
results = "?"
book_index = "Chapter 1: Conversation Chapter 2: Dress Chapter 3: Traveling Chapter 4: How to Behave at a Hotel Chapter 5: Letters of Business Chapter 6: Letters of the Heart Chapter 7: Evening Parties as Hostess"

We want to see if our topic appears within *book_index* to find which chapter we might want to reference later on. Using the __if statement__ below, we're changing the variable _results_ from "?" to an answer if our condition is met.

In [4]:
if topic in book_index:
	results = "Yeah, there's a result. Or results? I really don't know"
else: 
	results = "Nah, I don't think there are any results."

print(results)

Yeah, there's a result. Or results? I really don't know


As _results_ shows us above, this current if statement wouldn't be able to distinguish whether the word "Letters" appears one or a million times in book_index; its condition is satisfied after the first hit. By using __lists__ and __slices__, we are going to write code that will give us significantly more informative results.

## Slicing

If we're going to build code that will give us more robust information, we will first need to know what the infomation we want is. One thing we might want to know is which exact chapters match our topic. By scanning the *book_index* string, our eyes might be able to quickly tell which chapter we want: typographically, the colon after each chapter number states what each chapter covers; there are seven chapters in *book_index* and two of those chapters deal with letters.

In [43]:
print(book_index)

Chapter 1: Conversation Chapter 2: Dress Chapter 3: Traveling Chapter 4: How to Behave at a Hotel Chapter 5: Letters of Business Chapter 6: Letters of the Heart Chapter 7: Evening Parties as Hostess


But this is not about _our_ eyes; we may see seven chapters, but if we run a _len()_ command on book_index, Python outputs a very different number.

In [8]:
len(book_index)

198

If we want to know what those 198 components are we can __slice__ through *book_index*. Much like a piece of pie, a __slice__ is a portion of a whole. We can __slice__ however large a piece that we'd like in whatever direction we prefer. __Slicing__ is marked by brackets [ ] after a variable. Using a colon allows us to offer more specific parameters for the __slice__ we want. If we leave one side of the colon blank, the __slice__ will determine that we either want everything up to or after that specific point.

In [14]:
book_index[0:10]
#try substituting in different numbers. What happens when you leave certain coordinates blank?#

'Chapter 1:'

In [15]:
book_index[-10:]

'as Hostess'

So, if we want to know what the first item in *book_index* is, we would type *book_index[0]*. If we wanted to know what the last part of *book_index* is, we would write *book_index[-1]*. If we only wanted to see a section of *book_index*, we would **slice** as follows: *book_index [3:8]*

This may not seem that important given that *book_index_*is pretty readable as is. But let’s imagine instead that *book_index* is actually a collection of indices from eight hundred self-help and etiquette books. Having all of this data as a single, readable string is untenable. Even if we used command/control F, we wouldn’t intuitively know which book we were looking at either.

## Splitting 

Thankfully, Python has a built in function for splitting a data type by common patterns. The _.split()_ commands allows users to take a string and split it based on a common feature. Which word or character would we want to __split__  *book_index* by if we wanted the results of *len(book_index)* to be 7? Or, which word occurs seven times times in valuable places?

In [6]:
my_birthday = "11/18/1991"
my_birthday_n = my_birthday.split("/")
b_month = my_birthday_n[0]
b_date = my_birthday_n[1]
b_year = my_birthday_n[2]

print(b_month)

11


In [46]:
#finish the following code#
book_index_n = book_index.split("")
len(book_index_n)

8

That should have worked! Why does the result say that *len = 8*? We have a blank entry. There are a few ways that we can get rid of this entry, but I would like to focus on one method that uses slicing. If the first entry is blank, how could we use slicing to only have relevant entries?

In [47]:
#finish the following code#
book_index_n = book_index_n[:]

type(book_index_n)

list

You may notice that when we __slice__ through *book_index_n*, some entires contain far more than a single character. Using the *type()* function, we find out that *type(book_index_n)* is a **list**, where as _topic_ and *book_index* are strings.

## Lists

A **list** is a scalable collection of similar types of data. Like __slices__, we define a **list** by using brackets. While we can have a **list** of integers and a __list__ of strings, the two do not mix. In the example below, some strings are converted into integers for mathematical operations and then converted back again.

In [14]:
#What do you think will happen here?#
["cat", "dog"][0]

'cat'

In [13]:
import random
my_favorite_things = ["funky patterned shirts", "homemade bread", "anything woolen", "bathbombs", "The Cure or The Cramps, depending on mood", "pancakes", "novels with dancing and scandal"]
lucky_numbers = [4, 21, 72]

Todays_month = 11
magic_number = int(random.random() * len(my_favorite_things))


if Todays_month == int(b_month):
   print("Malcolm likes " + my_favorite_things[magic_number])
else:
    print("Only " + str(abs(int(b_month) - Todays_month)) + " more months until Malcolm's birthday ;)")


Malcolm likes novels with dancing and scandal


One important quality to **lists** is that they’re mutable. We can add new entries to a list and even change existing entries.

We’re going to test this mutability by adding the word “Chapter” back to each entry. Test the following two pieces of code. Why does one work and the other does not?

In [15]:
for entry in book_index_n:
	entry = "Chapter" + entry

print(book_index_n)


NameError: name 'book_index_n' is not defined

In [49]:
for number in range(len(book_index_n)):
	book_index_n[number] = "Chapter" + book_index_n[number]
    
print(book_index_n)

['Chapter 1: Conversation ', 'Chapter 2: Dress ', 'Chapter 3: Traveling ', 'Chapter 4: How to Behave at a Hotel ', 'Chapter 5: Letters of Business ', 'Chapter 6: Letters of the Heart ', 'Chapter 7: Evening Parties as Hostess']


If we wanted to delimit these results further, we could use the _split()_ command once again.

In [52]:
book_index_BEST = []

for item in book_index_n:
    ch = item.split(":")[0]
    des = item.split(":")[1]
    book_index_BEST.append([ch, des])
    
print(book_index_BEST)

[['Chapter 1', ' Conversation '], ['Chapter 2', ' Dress '], ['Chapter 3', ' Traveling '], ['Chapter 4', ' How to Behave at a Hotel '], ['Chapter 5', ' Letters of Business '], ['Chapter 6', ' Letters of the Heart '], ['Chapter 7', ' Evening Parties as Hostess']]


Now that we have our **list** up and running, we can be a bit more ambitious with our research. Instead of searching only for a single topic, let’s look at multiple topics. Our first step is going to be changing our variable from topic to topics. Now it’s official. Next, we’ll want to add our new topics, using the _append()_ function of Python.


In [54]:
topics = topic
topics.append("Parties", "Conduct", "Etiquette", "Dainty H'ors D'ouevres")
topics

AttributeError: 'str' object has no attribute 'append'

Why didn’t that work? For one, pluralizing topic to topics didn’t change it from a single string to a list. Making a list in Python requires brackets.

In [55]:
topics = [topic]
topics.append("Parties", "Conduct", "Etiquette", "Dainty H'ors D'ouevres")
topics

TypeError: append() takes exactly one argument (4 given)

So let’s try this again, now that we’re positive that topics is a list. Why didn’t that work? The _append()_ command adds only a single item to a list at a time. We got greedy and tried to add too many at once. Working either alone or with a partner, come up with a way to add these new_topics to the list _topics_ one by one.

In [56]:
for new_topic in ["Parties", "Conduct", "Etiquette", "Dainty H'ors D'ouevres"]:
	topics.append(new_topic)
topics


['Letters', 'Parties', 'Conduct', 'Etiquette', "Dainty H'ors D'ouevres"]

Now let's put it all together!

In [None]:
for topic in topics:
	for word in book_index_BEST:
		if topic in word[1]:
			print(word[0])

## Challenge

The challenge belows leaps foward into important DH procedures/methods such as tokenizing, bigrams, and topic modeling. Much the way that making a pizza bagel in the microwave resembles cooking a neopolitan pizza in a handmade wood-fired oven, this exercise is designed to coveinently give you the flavor and general idea of these concepts rather than offer an intensive explanation of their mechancis.

Some general advice, particularly when using __for__ and __if__ statements is to routinely __print()__ your work thus far. If, for instance, I'm receiving coding errors that tell me that I can't iterate using a string, I'll want to print what token is in for token in poormans_tokens:

In [None]:
text = "Methinks I hear some of you say, must a man afford himself no leisure I will tell thee my friend what Poor Richard says employ thy time well if thou meanest to gain leisure and since thou art not sure of a minute throw not away an hour Leisure is time for doing something useful this leisure the diligent man will obtain, but the lazy man never so that, as Poor Richard says, a life of leisure and a life of laziness are two things Do you imagine that sloth will afford you more comfort than labor No, for as Poor Richard says trouble springs from idleness and grievous toil from needless ease Many without labor would live by their wits only but they break for want of stock Whereas industry gives comfort and plenty and respect fly pleasures, and they'll follow you The diligent spinner has a large shift and now I have a sheep and a cow everybody bids me good morrow all which is well said by Poor Richard."

topics = ["leisure", "labor", "time"]
synonyms_for_leisure["ease", "comfort"]
synonyms_for_labor["employ", "industry", "toil"]
synonyms_for_time("hour", "minute", "morrow")


#what goes here?#
poormans_tokens = text.split(" ")



context_words = []
#fix this code to output the words that are within one space in either direction of a hit. Clue: you need to perform a mathematical operation to get these words#
for token in poormans_tokens:
	for topic in topics:
		if topic in token:
			context_words.append([topic[-1], topic, topic[+1])


leisure_context = []
labor_context = []
time_context = []

#rewrite the code above to perform a similar operation with each synoynm#

for entry in context_words:
	if entry[1] == "leisure":
		leisure_context.append(entry)
	elif entry[1] == "labor:
		labor_context.append(entry)
	else entry[1] == "time""
		time_context.append(entry)

#print the lists and, using human judgement since we're going to do counts later on, write whether you think any of our topics and their synonyms appear in more synatactically similar conditions than others. How would these results change if you took the words that surrounded a topic word by two spaces instead of one? I.E.  "to gain leisure and since" instead of "gain leisure and" for leisure #