# Python Functions

* Functions are the primary and most important method of code organization and reuse in Python.
* As a rule of thumb, if you anticipate needing to repeat the same or very similar code more than once, it may be worth writing a reusable function.
* Functions can also help make your code more readable by giving a name to a group of Python statements.

In [2]:
from urllib.request import urlopen, Request
import json

## Exercise
Write a Python function that takes a number as a parameter and check whether that number is prime or not.

is 1 a prime number? False \
Is 2 a prime number? True \
Is 3 a prime number? True \
Is 4 a prime number? False

Prime number definition: A number that's only divisible by 1 and itself

# Reading JSON files

## Exercise
Create a list of the names of noble laureates appearing in this file:
https://api.nobelprize.org/v1/prize.json

Each name should be in format: Firstname Lastname, e.g., Carolyn Bertozzi

In [3]:
url = 'https://api.nobelprize.org/v1/prize.json'
request = Request(url)
request.add_header('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36')
response = urlopen(request).read()

In [7]:
response_json = json.loads(response)

In [10]:
response_json['prizes'][0]

{'year': '2022',
 'category': 'chemistry',
 'laureates': [{'id': '1015',
   'firstname': 'Carolyn',
   'surname': 'Bertozzi',
   'motivation': '"for the development of click chemistry and bioorthogonal chemistry"',
   'share': '3'},
  {'id': '1016',
   'firstname': 'Morten',
   'surname': 'Meldal',
   'motivation': '"for the development of click chemistry and bioorthogonal chemistry"',
   'share': '3'},
  {'id': '743',
   'firstname': 'Barry',
   'surname': 'Sharpless',
   'motivation': '"for the development of click chemistry and bioorthogonal chemistry"',
   'share': '3'}]}

In [11]:
response_json['prizes'][0]['laureates'][0]['firstname']

'Carolyn'

# Processing Textual Data

In [None]:
url = 'https://www.justice.gov/api/v1/blog_entries.json?amp%3Bpagesize=2'
request = Request(url)
request.add_header('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36')
response = urlopen(request).read()

In [None]:
response_json = json.loads(response)

In [None]:
response_json['results'][0].keys()

In [None]:
response_json['results'][0]['body']

## Using TextBlob

Installing instructions:
    https://textblob.readthedocs.io/en/latest/install.html
    
Using conda:

conda install -c conda-forge textblob

python -m textblob.download_corpora

In [5]:
from textblob import TextBlob

In [12]:
firstComment = response_json['results'][0]

KeyError: ignored

In [None]:
# Getting the title of firstComment
body = firstComment['body'][:200]
body

In [None]:
w = TextBlob(body)

In [None]:
import nltk
nltk.download('punkt')

In [None]:
nltk.download('averaged_perceptron_tagger')

In [None]:
w.tags

In [None]:
nltk.download('brown')

In [None]:
w.noun_phrases

### Sentiment analysis
The sentiment property returns a namedtuple of the form Sentiment(polarity, subjectivity). The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

In [None]:
w.sentiment

In [None]:
w.sentiment[0]

## Calculate the average sentiment analysis of 100 comments stored in the variable myData.

### Tokenization
Break TextBlobs into words or sentences

In [None]:
w.words

In [None]:
w.sentences

In [None]:
for sentence in w.sentences:
    print (sentence.sentiment)

In [None]:
w

In [None]:
w.words[0]

In [None]:
w.words[0].pluralize()

### Wordnet Integration

In [None]:
from textblob import Word
word = Word("frivolous")
word

In [None]:
import nltk
nltk.download('omw-1.4')

In [None]:
nltk.download('wordnet')

In [None]:
word.definitions

In [None]:
b = TextBlob("I am succh a greattt wriiter!")
b.correct()

In [None]:
fw = Word('falibility')
fw.spellcheck()

In [None]:
w = TextBlob(body)
w.words.count('the')

In [None]:
w.words.count('the', case_sensitive=True)

In [None]:
w

In [None]:
w.ngrams(n=2)

In [None]:
w.ngrams(n=3)

In [None]:
import nltk
nltk.download('stopwords')

In [None]:
from nltk.corpus import stopwords

In [None]:
# stopwords
stop=set(stopwords.words("english"))

In [None]:
stop

In [None]:
# Removing stop words using set difference operation
print (set(w.words) - stop)