# Simple Python

[Python tutorial](https://docs.python.org/3/tutorial/) and [documentation](https://docs.python.org/3/).

For help with Jupyter notebooks press `h` when not in edit mode for a cell or see the [documentation](https://jupyter-notebook.readthedocs.io/en/stable/).

## Part 1: Text
The `text.txt` file contains the opening sentences of Charles' Dickens novel *A Tale of Two Cities*. The code below will read the file, store it in a string variable `text`, and print the first 500 characters.

In [1]:
f = open('text.txt')
text = f.read()
f.close()

print(text[:500], '...')

it was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of light, it was the season of darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to heaven, we were all going direct the other way- in short, the period was so far like the present period, that some of its noisiest ...


### Question
Print the first and the last sentence of the text. You may assume sentences only end with a period (`'.'`).
Python `split` method for strings: https://docs.python.org/3/library/stdtypes.html?highlight=split#str.split `sep='.'`.

In [2]:
# Store all separate sentences into a list
sentenceList = text.split(".")
firstSentence = sentenceList[0]
# Note that the last item is null so we take the -2th item
lastSentence = sentenceList[-2]
# We add a period (.) to replace the stripped period from .split()
print("First Sentence:\n" + firstSentence + ".")
print("\n")
# [1:] gets rid of the unnecessary space at the beginning
print("Last Sentence:\n" + lastSentence[1:] + ".")

First Sentence:
it was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of light, it was the season of darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to heaven, we were all going direct the other way- in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only.


Last Sentence:
thus did the year one thousand seven hundred and seventy-five conduct their greatnesses, and myriads of small creatures-the creatures of this chronicle among the rest-along the roads that lay before them.


### Findings

**First Sentence:**
it was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of light, it was the season of darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to heaven, we were all going direct the other way- in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only.

**Last Sentence:**
thus did the year one thousand seven hundred and seventy-five conduct their greatnesses, and myriads of small creatures-the creatures of this chronicle among the rest-along the roads that lay before them.

### Question
How many total vowels (`a`, `e`, `i`, `o`, `u`) appear in the text?

In [3]:
# Code for question 2
aCount = text.count("a")
eCount = text.count("e")
iCount = text.count("i")
oCount = text.count("o")
uCount = text.count("u")
vowelDict = {"a": aCount, "e": eCount, "i": iCount, "o": oCount, "u": uCount}
print(vowelDict)
vowelCount = sum(vowelDict.values())
print("Total Vowels: %d" %(vowelCount))

{'a': 384, 'e': 581, 'i': 293, 'o': 354, 'u': 123}
Total Vowels: 1735


### Findings
Total Vowels: 1735

### Question
A trigram is a string of exactly 3 characters (or a string with length exactly 3), including blank spaces. For example, the first five trigrams in the text are:
1. `'it '`
2. `'t w'`
3. `' wa'`
4. `'was'`
5. `'as '`

What is the most common trigram in the text?   

In [3]:
from collections import Counter
# [idx:idx+3] iterates through every trigram in the string
trigramFreq = dict(Counter(text[idx : idx + 3] for idx in range(len(text) - 1)))
trigramKeys = list(trigramFreq.keys())
trigramValues = list(trigramFreq.values())
# The key corresponding to the maximum value
maxKey = trigramKeys[trigramValues.index(max(trigramValues))]
print("The most common trigram in the text is '%s' with %d appearances." %(maxKey, max(trigramValues)))

The most common trigram in the text is ' th' with 136 appearances.


### Findings
The most common trigram in the text is ' th' with 136 appearances.

## Part 2: Rides

The [boston.csv](boston.csv) file contains data on weekday Uber rides in the Boston, Massachusetts metropolitan areas from the [Uber Movement](https://movement.uber.com) project. The `sourceid` and `dstid` columns contain codes corresponding to start and end locations of each ride. The `hod` column contains codes corresponding to the hour of the day the ride took place. The `ride time` column contains the length of the ride, in minutes.

The code below will open the file and read the data as a list of rows, with each row represented as a string. It then prints the first four rows. Note that the first (index 0) row contains the column headers. 

In [4]:
f = open('boston.csv')
data = f.readlines()
f.close()

for row in data[:4]:
    print(row)

sourceid,dstid,hod,ride time

584,33,7,11.866000000000001

1013,1116,13,17.799333333333333

884,1190,22,19.348833333333335



### Question
How many rides are listed in the the file?

In [5]:
# Code for question 4
rideCount = 0
# Excludes first line (column headers)
for row in data[1:]:
    rideCount += 1
print(rideCount)

200000


### Findings
There are 20,000 rides listed in the file.

### Question
What is the maximum length of a ride?

In [6]:
lengthList = []
for row in data[1:]:
    attributes = row.split(",")
    lengthList.append(float(attributes[3]))
maxLength = max(lengthList)
print("Maximum Length = %f" %(maxLength))

Maximum Length = 1471.986167


### Findings
The maximum length of a ride is 1471.986167 minutes.

### Question
What is the average length of a ride?

In [8]:
# Code for question 6
import statistics
avgLength = statistics.mean(lengthList)
print("Average Length = %f" %(avgLength))

Average Length = 16.502478


### Findings
The average length of a ride is 16.502478 minutes.

### Question
What percentage of rides are under 10 minutes?

In [7]:
# Code for question 7
underCount = 0
for ride in lengthList:
    if ride - 10 < 0:
        underCount += 1
underPercent = underCount / rideCount
print(underPercent)

0.22829


### Findings
22.829% of rides are under 10 minutes.

### Question
For each HOD (hour of day), count how many rides take place during that hour.

In [8]:
# Code for question 8
HODList = []
for row in data[1:]:
    attributes = row.split(",")
    HODList.append(int(attributes[2]))
from collections import Counter
HODCounter = dict(Counter(HODList))
print(dict(sorted(HODCounter.items(), key=lambda item: item[1])))

{3: 4296, 4: 4371, 5: 5343, 2: 5555, 1: 6568, 6: 7424, 0: 7600, 18: 8365, 11: 8755, 13: 8813, 12: 8885, 19: 8926, 10: 9053, 7: 9173, 23: 9258, 9: 9259, 8: 9510, 14: 9688, 15: 9748, 20: 9771, 22: 9863, 21: 9894, 16: 9930, 17: 9952}


### Findings
The amount of rides that take place during each hour of day are shown below (in ascending order):

{3: 4296, 4: 4371, 5: 5343, 2: 5555, 1: 6568, 6: 7424, 0: 7600, 18: 8365, 11: 8755, 13: 8813, 12: 8885, 19: 8926, 10: 9053, 7: 9173, 23: 9258, 9: 9259, 8: 9510, 14: 9688, 15: 9748, 20: 9771, 22: 9863, 21: 9894, 16: 9930, 17: 9952}

### Question
What are the three most common **start** locations (`sourceid`) for rides in the dataset? https://docs.python.org/3/howto/sorting.html.

In [9]:
sourceList = []
for row in data[1:]:
    attributes = row.split(",")
    sourceList.append(int(attributes[0]))
from collections import Counter
sourceCounter = dict(Counter(sourceList))
# reverse = True compiles a descending order
sortedSourceCounter = dict(sorted(sourceCounter.items(), key=lambda item: item[1], reverse = True))

### Findings
The following are the three most common start locations along with their respective frequencies:

1. 885: 1181 

2. 498: 1067 

3. 435: 1057