# Table of Contents
We are going to cover the following topics in this class:
1. Python Functions
2. Working with Files
3. Using APIs to Collect Data
4. Reading JSON Data
5. Processing Textual Data

# 1. Python Functions

* Functions are the primary and most important method of code organization and reuse in Python.
* As a rule of thumb, if you anticipate needing to repeat the same or very similar code more than once, it may be worth writing a reusable function.
* Functions can also help make your code more readable by giving a name to a group of Python statements.

In [None]:
def my_function(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

In [None]:
my_function(10, 20)

In [None]:
my_function(5, 6, z=0.7)

In [None]:
my_function(3.14, 7, 3.5)

## 1.1 Namespaces and Scope

In [None]:
a = 8

In [None]:
a

In [None]:
def func():
    a = []
    for i in range(5):
        a.append(i)

In [None]:
func()

In [None]:
a

In [None]:
a = []
def func():
    for i in range(5):
        a.append(i)

In [None]:
func()

In [None]:
a

In [None]:
func()

In [None]:
a

## 1.2 Returning Multiple Values

In [None]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

a, b, c = f()

In [None]:
a

In [None]:
b

In [None]:
c

## 1.3 Anonymous (Lambda) Functions

In [None]:
def short_function(x):
    return x * 2

In [None]:
short_function(7)

In [None]:
equiv_anon = lambda x: x * 2

In [None]:
equiv_anon(7)

## Exercise
Write a Python function that takes a number as a parameter and check whether that number is prime or not.

is 1 a prime number? False \
Is 2 a prime number? True \
Is 3 a prime number? True \
Is 4 a prime number? False

Prime number definition: A number that's only divisible by 1 and itself

In [1]:
def is_prime(n):
    if (n==1):
      return False
    else:
      for i in range(2,n):
        if(n%i==0):
          return False
      return True


In [None]:
is_prime(46587)

In [None]:
is_prime (3)

# 2. Files and the Operating System

In [None]:
path = 'segismundo.txt'
f = open(path)

In [None]:
f.close()

In [None]:
with open(path) as f:
    lines = f.readlines()
lines

In [None]:
with open(path) as f:
    lines = [x.rstrip() for x in f]

In [None]:
lines

# 3. Using APIs to collect data
## Example: Getting Reddit comments containing the word 'Queen Elizabeth' posted over the last weekend

In [None]:
# A library to create URL
import urllib.parse

In [None]:
# import urllib library
from urllib.request import urlopen

In [None]:
import json

In [None]:
search_term = 'Queen Elizabeth'
before = '3d'
after = '4d'
num_comments = 100

In [None]:
url = 'https://api.pushshift.io/reddit/search/comment/?'
parameters = {
    'q': search_term,
    'before': before,
    'after': after,
    'size': num_comments
}

In [None]:
my_url = url + urllib.parse.urlencode(parameters,safe=",")
print (my_url)

In [None]:
# store the response of URL
response = urlopen(my_url)

In [None]:
response

# 4. Reading JSON data

In [None]:
# json.loads parses a valid JSON string and converts it into a Python Dictionary.
data_json = json.loads(response.read())

In [None]:
data_json

In [None]:
data_json.keys()

In [None]:
myData = data_json['data']
myData

In [None]:
len(myData)

In [None]:
firstComment = myData[0]
firstComment

In [None]:
# Printing JSON object using indent formatting
print(json.dumps(firstComment, indent=4))

## Exercise
Write a function that takes a word as a paramter, retrieves 100 Reddit comments containing that word, and returns the average score of those comments.

In [None]:
getAvgScore("Russia")

# 5. Processing Textual Data

## Using TextBlob

Installing instructions:
    https://textblob.readthedocs.io/en/latest/install.html
    
Using conda:

conda install -c conda-forge textblob

python -m textblob.download_corpora

In [None]:
!conda install -c conda-forge textblob

In [None]:
from textblob import TextBlob

In [None]:
firstComment

In [None]:
# Getting the 'body' field of firstComment
body = firstComment['body']
body

In [None]:
w = TextBlob(body)

In [None]:
w.tags

In [None]:
w.noun_phrases

### Sentiment analysis
The sentiment property returns a namedtuple of the form Sentiment(polarity, subjectivity). The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

In [None]:
w.sentiment

In [None]:
w.sentiment[0]

## Calculate the average sentiment analysis of 100 comments stored in the variable myData.

In [None]:
sentiments = []
for comment in myData:
    body = comment['body']
    w = TextBlob(body)
    sentiment = w.sentiment[0]
    sentiments.append(sentiment)

sum(sentiments)/len(sentiments)

### Tokenization
Break TextBlobs into words or sentences

In [None]:
w.words

In [None]:
w.sentences

In [None]:
for sentence in w.sentences:
    print (sentence.sentiment)

In [None]:
w

In [None]:
w.words[0]

In [None]:
w.words[0].pluralize()

### Wordnet Integration

In [None]:
from textblob import Word
word = Word("frivolous")
word

In [None]:
import nltk
nltk.download('omw-1.4')

In [None]:
word.definitions

In [None]:
b = TextBlob("I am succh a greattt wriiter!")
b.correct()

In [None]:
fw = Word('falibility')
fw.spellcheck()

In [None]:
w = TextBlob(body)
w.words.count('the')

In [None]:
w.words.count('the', case_sensitive=True)

In [None]:
w

In [None]:
w.ngrams(n=2)

In [None]:
w.ngrams(n=3)

In [None]:
import nltk
nltk.download('stopwords')

In [None]:
from nltk.corpus import stopwords

In [None]:
# stopwords
stop=set(stopwords.words("english"))

In [None]:
stop

In [None]:
# Removing stop words using set difference operation
print (set(w.words) - stop)