## Agenda

*  Regular Expressions
*  Python PIP
*  Python JSON
*  Pickling

## Regular Expressions Regex

#### Why Regex?

In [160]:
text = "This is a string with some email addresses: johndoe@example.com, janedoenew@example.org"

In [230]:
import re

#### Introduction
Regular expressions (regex) are a powerful tool for finding patterns in text. They can be used to extract information from text, to validate data, and to perform a variety of other tasks.

In Python, regular expressions are implemented in the re module. The re module provides a number of functions and constants to work with regular expressions.

#### Basics of Regular Expressions
A regular expression is a sequence of characters that defines a pattern. The pattern can be used to match a specific string or a set of strings.

The basic components of a regular expression are:

*  Characters: Any character can be used in a regular expression.
*  Metacharacters: Metacharacters are special characters that have special meaning in regular expressions.
*  Quantifiers: Quantifiers are used to specify the number of times a character or group of characters can occur in a pattern.


##### Metacharacters
The following are some of the most common metacharacters:

*  .: Matches any character.
*  +: Matches one or more occurrences of the preceding character or group of characters.
*  ?: Matches zero or one occurrence of the preceding character or group of characters.
*  [: Matches any character in the specified set.
*  ]: Closes a set.
*  \: Escapes the next character, making it a literal character.


##### Quantifiers
The following are some of the most common quantifiers:

*  {n}: Matches exactly n occurrences of the preceding character or group of characters.
*  {n,m}: Matches n to m occurrences of the preceding character or group of characters.
*  {n,}: Matches n or more occurrences of the preceding character or group of characters.

##### Examples

Here are some examples of regular expressions:

*  .*: Matches any string.
*  [a-z]+: Matches any string of one or more lowercase letters.
*  \d: Matches any digit.
*  \w: Matches any alphanumeric character.
*  \s: Matches any whitespace character.
*  ^: Matches the beginning of a string.
*  $: Matches the end of a string.


In [259]:
import re

In [289]:
text = "This is a string with some numbers: 12345"

In [290]:
pattern = '\d+'
matches = re.findall(pattern, text)

print(matches)


['12345']


In [271]:
pattern2= 'is'
matches = re.finditer(pattern2,text)

pattern = re.compile('is')
matches = pattern.finditer(text)


# print(matches)
# print(type(matches))


for match in matches:
    print(match)
    print(type(match))
    print(match.start(),match.end())

<re.Match object; span=(2, 4), match='is'>
<class 're.Match'>
2 4
<re.Match object; span=(5, 7), match='is'>
<class 're.Match'>
5 7


In [258]:
text = "This is a string with the word 'the' in it."

pattern = 'the'
matches = re.finditer(pattern, text)
print(matches)
for match in matches:
  print(match)
  print(match.start(), match.end())


<callable_iterator object at 0x116179460>
<re.Match object; span=(22, 25), match='the'>
22 25
<re.Match object; span=(32, 35), match='the'>
32 35


In [200]:
text = "This is a string with some punctuation."
pattern = '.'
matches = re.findall(pattern, text)

print(matches)

# . Matches any character.


['T', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 's', 't', 'r', 'i', 'n', 'g', ' ', 'w', 'i', 't', 'h', ' ', 's', 'o', 'm', 'e', ' ', 'p', 'u', 'n', 'c', 't', 'u', 'a', 't', 'i', 'o', 'n', '.']


In [295]:
print(r'\n')

\n


In [351]:
text = "This is is a string with some punctuation."
pattern = r'\bis'
matches = re.findall(pattern, text)
print(matches)

['is', 'is', 'is']


In [304]:
text = "This is a string with some punctuations"
pattern = 's+'
matches = re.findall(pattern, text)
print(matches)

['s', 's', 's', 's', 's']


In [306]:
text = "This is a string with some punctuation."
pattern = '[a-z,A-Z]+'
matches = re.findall(pattern, text)
print(matches)

['This', 'is', 'a', 'string', 'with', 'some', 'punctuation']


In [220]:
text = "This is a string with some punctuation."
pattern = '\.'
matches = re.findall(pattern, text)
print(matches)

['.']


In [307]:
text = "This is a string with some punctuation."
pattern = '^This'
matches = re.findall(pattern, text)
print(matches)

['This']


In [371]:
text = "This is a string with some punctuation."
pattern = 'punctuation.$'
matches = re.findall(pattern, text)
print(matches)

['punctuation.']


In [383]:
# text = '''This is a string. with some punctuation.
# This is a string with some punctuation'''
text = 'This is. a string. with some punctuation.'
# pattern = '.*(?!.)$'

pattern = ''
matches = re.findall(pattern, text)
print(matches)

['.']


In [338]:
text = " some email addresses: johndoe@example.com, janedoenew@example.org, jp@gmail.com,jp@gmail.net, jp@gmailnew.com"
pattern = '\w+@\w+\.\w+'
emails = re.findall(pattern, text)

g_pattern = '\w+@gmail\.[com|net]+'

gmails = re.findall(g_pattern, text)
print(gmails)
# print(emails)

['jp@gmail.com', 'jp@gmail.net']


In [346]:
text = '''555-444-2221
+91*9025228000
+91-9025228001
+91 9025228002'''

# pattern = '\+\d\d.\d\d\d\d\d\d\d\d\d\d'
# emails = re.findall(pattern, text)
# print(emails)

# pattern = '\+\d\d.\d+'
# emails = re.findall(pattern, text)
# print(emails)

pattern = '\+\d\d[\s-]\d+'
emails = re.findall(pattern, text)
print(emails)


['+91-9025228001', '+91 9025228002']


## Python PIP

Pip is a package manager for Python. It is used to install, uninstall, and manage Python packages.



In [384]:
%pip --version

pip 23.2.1 from /Users/Z00CVY1/Library/Python/3.9/lib/python/site-packages/pip (python 3.9)
Note: you may need to restart the kernel to use updated packages.


In [385]:
%pip list

Package             Version
------------------- -----------
altgraph            0.17.2
appnope             0.1.3
asttokens           2.2.1
backcall            0.2.0
comm                0.1.4
contourpy           1.1.0
cycler              0.11.0
DateTime            5.2
debugpy             1.6.7.post1
decorator           5.1.1
executing           1.2.0
fonttools           4.42.0
future              0.18.2
importlib-metadata  6.8.0
importlib-resources 6.0.1
ipykernel           6.25.1
ipython             8.14.0
jedi                0.19.0
joblib              1.3.2
jupyter_client      8.3.0
jupyter_core        5.3.1
kiwisolver          1.4.4
macholib            1.15.2
matplotlib          3.7.2
matplotlib-inline   0.1.6
nest-asyncio        1.5.7
numpy               1.25.2
packaging           23.1
parso               0.8.3
pexpect             4.8.0
pickleshare         0.7.5
Pillow              10.0.0
pip                 23.2.1
platformdirs        3.10.0
prompt-toolkit      3.0.39
psutil        

In [2]:
%pip install --user scikit-learn

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [70]:
%pip install --upgrade pip

Defaulting to user installation because normal site-packages is not writeable
Collecting pip
  Downloading pip-23.2.1-py3-none-any.whl (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 2.6 MB/s eta 0:00:01
[?25hInstalling collected packages: pip
Successfully installed pip-23.2.1
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [33]:
%pip install package_name

Defaulting to user installation because normal site-packages is not writeable
Collecting package_name
  Downloading package_name-0.1.tar.gz (782 bytes)
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: package_name
  Building wheel for package_name (setup.py) ... [?25ldone
[?25h  Created wheel for package_name: filename=package_name-0.1-py3-none-any.whl size=1254 sha256=a9f47ff4bdb4f3b513f4300459d7a85307b5ac1dbfb77be146e0b975d9a2fa9b
  Stored in directory: /Users/Z00CVY1/Library/Caches/pip/wheels/67/e6/c3/cbfcab244d830378592564f5e46da23a8aad979c4a958b401a
Successfully built package_name
Installing collected packages: package_name
Successfully installed package_name-0.1


In [4]:
%pip uninstall pandas

Found existing installation: pandas 2.0.3
Uninstalling pandas-2.0.3:
  Would remove:
    /Users/Z00CVY1/Library/Python/3.9/lib/python/site-packages/pandas-2.0.3.dist-info/*
    /Users/Z00CVY1/Library/Python/3.9/lib/python/site-packages/pandas/*
Proceed (Y/n)? ^C
[31mERROR: Operation cancelled by user[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


## JSON in Python

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is often used to transmit data between a server and a client, or to store data in a file.

JSON is a text-based format, which means that it can be easily read and written by humans. It is also easy to parse and manipulate using Python.

##### Basics of JSON
A JSON object is a collection of key-value pairs. The keys are strings, and the values can be strings, numbers, arrays, or objects.

An array is a list of values. The values in an array can be of any type.

An object is a collection of key-value pairs. The keys and values in an object can be of any type.

Here is an example of a JSON object:




In [6]:
import json

In [45]:
people = '''{
  "name": "John Doe",
  "age": 30,
  "address": {
    "street": "123 Main Street",
    "city": "Anytown",
    "state": "CA"
  }
}'''
print(type(people))

<class 'str'>


In [47]:
data = json.loads(people)

print(type(data))
print(data)


<class 'dict'>
{'name': 'John Doe', 'age': 30, 'address': {'street': '123 Main Street', 'city': 'Anytown', 'state': 'CA'}}


In [32]:
for i in data['address']:
    print(i,':', data['address'][i])

street : 123 Main Street
city : Anytown
state : CA


In [19]:
add = data['address']
add['state']

'CA'

In [76]:
people_ = '''
{
  "people_collection":[
  {
    "name": "John Doe",
    "age": 30,
    "email": null,
    "address": {
      "street": "123 Main Street",
      "city": "Anytown",
      "state": "CA"
    }
  },
  {
    "name": "Jack Snyder",
    "age": 43,
    "email": null,
    "address": {
      "street": "222 Main Street",
      "city": "Newyork",
      "state": "NY"
    }
  }  
]
}
'''
print(type(people))
people

<class 'str'>


'{\n  "name": "John Doe",\n  "age": 30,\n  "address": {\n    "street": "123 Main Street",\n    "city": "Anytown",\n    "state": "CA"\n  }\n}'

In [77]:
data = json.loads(people_)

print(type(data))
print(data)



<class 'dict'>
{'people_collection': [{'name': 'John Doe', 'age': 30, 'email': None, 'address': {'street': '123 Main Street', 'city': 'Anytown', 'state': 'CA'}}, {'name': 'Jack Snyder', 'age': 43, 'email': None, 'address': {'street': '222 Main Street', 'city': 'Newyork', 'state': 'NY'}}]}


In [62]:
print(type(data['people_collection']))
# data['people_collection']
data['people_collection'][0]['name']



<class 'list'>


'John Doe'

In [64]:
for person in data['people_collection']:
    print(person)

{'name': 'John Doe', 'age': 30, 'email': None, 'address': {'street': '123 Main Street', 'city': 'Anytown', 'state': 'CA'}}
{'name': 'Jack Snyder', 'age': 43, 'email': None, 'address': {'street': '222 Main Street', 'city': 'Newyork', 'state': 'NY'}}


In [72]:
for person in data['people_collection']:
    print(person['name'], ' is from ', person['address']['city'])
    


John Doe  is from  Anytown
Jack Snyder  is from  Newyork


In [78]:
for person in data['people_collection']:
    print(person)

{'name': 'John Doe', 'age': 30, 'email': None, 'address': {'street': '123 Main Street', 'city': 'Anytown', 'state': 'CA'}}
{'name': 'Jack Snyder', 'age': 43, 'email': None, 'address': {'street': '222 Main Street', 'city': 'Newyork', 'state': 'NY'}}


In [79]:
for person in data['people_collection']:
    del person['email']

In [80]:
for person in data['people_collection']:
    print(person)

{'name': 'John Doe', 'age': 30, 'address': {'street': '123 Main Street', 'city': 'Anytown', 'state': 'CA'}}
{'name': 'Jack Snyder', 'age': 43, 'address': {'street': '222 Main Street', 'city': 'Newyork', 'state': 'NY'}}


In [87]:
jsbvskjbe = json.dumps(data,indent=2)

print(type(jsbvskjbe))
print(jsbvskjbe)


<class 'str'>
{
  "people_collection": [
    {
      "name": "John Doe",
      "age": 30,
      "address": {
        "street": "123 Main Street",
        "city": "Anytown",
        "state": "CA"
      }
    },
    {
      "name": "Jack Snyder",
      "age": 43,
      "address": {
        "street": "222 Main Street",
        "city": "Newyork",
        "state": "NY"
      }
    }
  ]
}


In [88]:
with open('states.json') as f:
    data = json.load(f)

In [89]:
data

{'states': [{'key': 'Andhra Pradesh', 'val': 'Hyderabad'},
  {'key': 'Arunachal Pradesh', 'val': 'Itanagar'},
  {'key': 'Assam', 'val': 'Dispur'},
  {'key': 'Bihar', 'val': 'Patna'},
  {'key': 'Chhattisgarh', 'val': 'Raipur'},
  {'key': 'Goa', 'val': 'Panaji'},
  {'key': 'Gujarat', 'val': 'Gandhinagar'},
  {'key': 'Haryana', 'val': 'Chandigarh'},
  {'key': 'Himachal Pradesh', 'val': 'Shimla'},
  {'key': 'Jammu & Kashmir', 'val': 'Srinagar(Summer)/Jammu(Winter)'},
  {'key': 'Jharkhand', 'val': 'Ranchi'},
  {'key': 'Karnataka', 'val': 'Bengaluru'},
  {'key': 'Kerala', 'val': 'Thiruvananthapuram'},
  {'key': 'Madhya Pradesh', 'val': 'Bhopal'},
  {'key': 'Maharashtra', 'val': 'Mumbai'},
  {'key': 'Manipur', 'val': 'Imphal'},
  {'key': 'Meghalaya', 'val': 'Shillong'},
  {'key': 'Mizoram', 'val': 'Aizawl'},
  {'key': 'Nagaland', 'val': 'Kohima'},
  {'key': 'Odisha', 'val': 'Bhubaneswar'},
  {'key': 'Punjab', 'val': 'Chandigarh'},
  {'key': 'Rajasthan', 'val': 'Jaipur'},
  {'key': 'Sikkim', '

In [94]:
for state in data['states']:
    print(state['val'], ' is the capital of ', state['key'])

Hyderabad  is the capital of  Andhra Pradesh
Itanagar  is the capital of  Arunachal Pradesh
Dispur  is the capital of  Assam
Patna  is the capital of  Bihar
Raipur  is the capital of  Chhattisgarh
Panaji  is the capital of  Goa
Gandhinagar  is the capital of  Gujarat
Chandigarh  is the capital of  Haryana
Shimla  is the capital of  Himachal Pradesh
Srinagar(Summer)/Jammu(Winter)  is the capital of  Jammu & Kashmir
Ranchi  is the capital of  Jharkhand
Bengaluru  is the capital of  Karnataka
Thiruvananthapuram  is the capital of  Kerala
Bhopal  is the capital of  Madhya Pradesh
Mumbai  is the capital of  Maharashtra
Imphal  is the capital of  Manipur
Shillong  is the capital of  Meghalaya
Aizawl  is the capital of  Mizoram
Kohima  is the capital of  Nagaland
Bhubaneswar  is the capital of  Odisha
Chandigarh  is the capital of  Punjab
Jaipur  is the capital of  Rajasthan
Gangtok  is the capital of  Sikkim
Chennai  is the capital of  Tamil Nadu
Hyderabad  is the capital of  Telangana
Agart

## Pickling in Python

Python pickle module is used for serializing and de-serializing a Python object structure. Any object in Python can be pickled so that it can be saved on disk. What Pickle does is it “serializes” the object first before writing it to a file. 

In [99]:
import pickle

example_dict = {1:"6", 2:"2", 3:"£"}

pickle_out = open ("dict.pickle", "wb")
pickle.dump(example_dict,pickle_out)
pickle_out.close()

In [96]:
with open ('dict2.pickle','wb') as pickle_out:
    pickle.dump(example_dict,pickle_out)


In [256]:
with open ('dict2.pickle','rb') as pic:
    example_dict2 = pickle.load(pic)
print(example_dict2)

{1: '6', 2: '2', 3: '£'}
