# Python Basics 

## 1. Common String Operations

**Convert a string to a list and a list to a string**

In [1]:
text = "John loves apples"
print("STRING:",text)

# convert string to list
tokens = text.split(' ')
print("LIST:",tokens)

# convert list to string
text_from_list = ' '.join(tokens)
print("STRING:",text_from_list)

STRING: John loves apples
LIST: ['John', 'loves', 'apples']
STRING: John loves apples


**Splitting a string**

In [2]:
txt = "apple#banana#cherry#orange"

x = txt.split("#")

print(x)

['apple', 'banana', 'cherry', 'orange']


**Check that the string only contains characters**

In [3]:
w1 = "apple"
w1.isalpha()

True

In [4]:
w1 = "apple5"
w1.isalpha()

False

## 2. Common operations on files 

**Printing out the names of the files contained in a directory**

In [5]:
import os

with os.scandir('../data/') as entries:
    for entry in entries:
        print(entry.name)

semeval-2020-task-7-dataset.zip
collect_webnlg_categories.zip
categories.txt~
bbc.zip
f30kE-captions-bio.zip
gpt2-medium.json
IMDB-Movie-Data.csv
movies-sf.txt
copyone
breton.csv
movies_500.csv
wkp_categories.txt
get_data.py
create_wkp_dir.sh~
eli5.json
food.csv
webnlg-test.txt
sanders-twitter-sentiment.csv
baseball.txt
amelie-with-lines.txt
ameliepoulain.txt
f30kE-captions-bio.txt
wkp_sorted
breton.xlsx
qa.txt
wkp
breton_dev_annot_sample.xlsx
bbc
wkp.zip
wiki.en.filtered.vec
categories.txt
wkp_sorted.zip
baseball.csv.zip
wiki.en.filtered.zip
baseball.csv
webnlg.txt


**Open a file and store its input into a string**

In [6]:
# infile is a reference to a file object
with open("../data/ameliepoulain.txt") as infile:
    file_content = infile.read()
    print("STRING:", file_content)

STRING: Amelie is a story about a girl named Amelie whose childhood was suppressed by her Father's mistaken concerns of a heart defect. With these concerns Amelie gets hardly any real life contact with other people. This leads Amelie to resort to her own fantastical world and dreams of love and beauty. She later on becomes a young woman and moves to the central part of Paris as a waitress. After finding a lost treasure belonging to the former occupant of her apartment, she decides to return it to him. After seeing his reaction and his new found perspective - she decides to devote her life to the people around her. Such as, her father who is obsessed with his garden-gnome, a failed writer, a hypochondriac, a man who stalks his ex girlfriends, the "ghost", a suppressed young soul, the love of her life and a man whose bones are as brittle as glass. But after consuming herself with these escapades - she finds out that she is disregarding her own life and damaging her quest for love. Amelie

**Open a file and store its lines into a list**
- The **readlines** method returns the contents of the entire file as a list of strings, where each item in the list represents one line of the file.
- The **readline** method reads one line from the file and returns it as a string. The string returned by readline will contain the newline character at the end. 

In [7]:
with open("../data/amelie-with-lines.txt") as f:
        lines = f.readlines()
        for l in lines:
            print("LIST ITEM:", l)
        

LIST ITEM: Amelie is a story about a girl named Amelie whose childhood was suppressed by her Father's mistaken concerns of a heart defect.

LIST ITEM: With these concerns Amelie gets hardly any real life contact with other people.

LIST ITEM: This leads Amelie to resort to her own fantastical world and dreams of love and beauty.

LIST ITEM: She later on becomes a young woman and moves to the central part of Paris as a waitress.

LIST ITEM: After finding a lost treasure belonging to the former occupant of her apartment, she decides to return it to him.

LIST ITEM: After seeing his reaction and his new found perspective - she decides to devote her life to the people around her.

LIST ITEM: Such as, her father who is obsessed with his garden-gnome, a failed writer, a hypochondriac, a man who stalks his ex girlfriends, the "ghost", a suppressed young soul, the love of her life and a man whose bones are as brittle as glass.

LIST ITEM: But after consuming herself with these escapades - she 

In [1]:
with open("../data/amelie-with-lines.txt") as f:
        l = f.readline()
        print("LIST ITEM:", l)
        

FileNotFoundError: [Errno 2] No such file or directory: '../data/amelie-with-lines.txt'

**Writing to a file**

https://www.w3schools.com/python/python_file_write.asp

In [8]:
# write to file
f = open("demofile2.txt", "w")
f.write("Now the file has more content!")
f.close()

#open and read the file after writing
f = open("demofile2.txt", "r")
print(f.read())

Now the file has more content!


## Common operations on texts

**Removing punctuation** 

In [9]:
# Define a translation table that maps each punctuation sign to the empty string
# i.e., that deletes punctuation signs
import string
translator = str.maketrans('', '', string.punctuation)

text = 'string with "punctuation" inside of it! Does this work? I hope so.'

# Apply the translation table to a string
# This deletes all punctuation signs in that string
text.translate(translator)

'string with punctuation inside of it Does this work I hope so'

**Lowercasing a token**

In [10]:
token = "Amelie"
print(token.lower())

amelie


**Pretty printing**

A formatted string literal or f-string is a string literal that is prefixed with 'f' or 'F'. These strings may contain replacement fields, which are expressions delimited by curly braces {}. While other string literals always have a constant value, formatted strings are really expressions evaluated at run time.

Some examples of formatted string literals:


In [11]:
name = "Fred"
print(f"He said his name is {name}.")

He said his name is Fred.


## Common operations on dictionaries

#### Sorting a dictionnary by values

In [21]:
d = {"Tom":67, "Tina": 54, "Akbar": 87, "Kane": 43, "Divya":73}
# create a list sorted by values
l =sorted((value, key) for (key,value) in d.items())
# create a sorted dictionary from the sorted list
sortdict=dict([(k,v) for v,k in l])
print(sortdict)

{'Kane': 43, 'Tina': 54, 'Tom': 67, 'Divya': 73, 'Akbar': 87}


**Creating a dictionary from a list of pairs**

In [13]:
d = dict([(1,'a'),(3,'b')])

#### Creating a dictionary from two lists

In [1]:
# create a list with student name
name = ['sravan', 'ojaswi', 'rohith', 'gnanesh', 'bobby']
 
# create a list with student age
age = [23, 21, 32, 11, 23]
 
# using dict method with zip()
dict(zip(name, age))

{'sravan': 23, 'ojaswi': 21, 'rohith': 32, 'gnanesh': 11, 'bobby': 23}

#### Creating a dictionary from a list of pairs

In [2]:
# create a list comprehension with student age
data = [('sravan', 23), ('ojaswi', 15),
		('rohith', 8), ('gnanesh', 4), ('bobby', 20)]


# display using iterable method
{key: value for (key, value) in data}


{'sravan': 23, 'ojaswi': 15, 'rohith': 8, 'gnanesh': 4, 'bobby': 20}

**Creating a dictionary using collections.defaultdic method**

In [14]:
import collections

# Set the size of the default value that will be assigned to each token to the current size of the vocabulary
token2int = collections.defaultdict(lambda: len(token2int)) 

# Create the dictionary from a list of tokens
[token2int[token] for token in ["The","woman", "put","the","book","on","the","table"]]

# Print it out
token2int.items()

dict_items([('The', 0), ('woman', 1), ('put', 2), ('the', 3), ('book', 4), ('on', 5), ('table', 6)])

**Getting the elements of a dictionary into a list**

In [15]:
d.items()

dict_items([(1, 'a'), (3, 'b')])

**Getting the keys of a dictionary**

In [16]:
d.keys()

dict_keys([1, 3])

**Inverting a dictionary**

In [17]:
token2int = dict([('cat',1),('dog',0)])
int2token = dict((i,t) for (t,i) in token2int.items())
print('token2int',token2int.items())
print('int2tokens',int2token.items())

token2int dict_items([('cat', 1), ('dog', 0)])
int2tokens dict_items([(1, 'cat'), (0, 'dog')])


**Printing out the key-value pairs of a dictionary**

In [18]:
for key, value in token2int.items():
    print( '{} : {}'.format( key, value ) )

cat : 1
dog : 0


#### Create a frequency distribution from a list
- each time a new key is found it is added to the dictionnary with value 0

In [4]:
from collections import defaultdict

s = 'mississippi'
d = defaultdict(int)
for k in s:
    d[k] += 1
d

defaultdict(int, {'m': 1, 'i': 4, 's': 4, 'p': 2})

## Common operations on tuples or lists

**Zipping tuples or lists**

The zip() function returns a zip object, which is an iterator of tuples where the first item in each passed iterator is paired together, and then the second item in each passed iterator are paired together etc.

If the passed iterators have different lengths, the iterator with the least items decides the length of the new iterator.

In [None]:
a = ("John", "Charles", "Mike")
b = ("Jenny", "Christy", "Monica")

x = zip(a, b)
for i in x:
    print(i)

In [None]:
numbers = [1, 2, 3]
letters = ['a', 'b', 'c']
zipped = zip(numbers, letters)
zipped  # Holds an iterator object
# convert iterator to list
list(zipped)

**Slicing lists**

**Reverting a list**

In [None]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[::-1])


**Creating a list of integers from 0 to 9**

In [None]:
print(list(range(10)))

**Using the enumerate method to create a (position, element) pairs from a list of elements**

https://book.pythontips.com/en/latest/enumerate.html

In [20]:
l = ["a","b","c"]
{(i, t) for i, t in enumerate(l)}

{(0, 'a'), (1, 'b'), (2, 'c')}

### Printing

**Format method**

- The format() method formats the specified value(s) and insert them inside the string's placeholder.
- The placeholder is defined using curly brackets: {}. Read more about the placeholders in the Placeholder section below.
- The format() method returns the formatted string.

In [None]:
txt1 = "My name is {fname}, I'm {age}".format(fname = "John", age = 36)
txt2 = "My name is {0}, I'm {1}".format("John",36)
txt3 = "My name is {}, I'm {}".format("John",36)

print(txt1)
print(txt2)
print(txt3)

**Star operator**

The print of * for a text is equal as printing print(text[0], text[1], ..., text[n]) and this is printing each part with a space between
The asterisk passes all the items in list with a space between into the print functio, without us even needing to specify how many arguments are in the list.

You can read more about it here: https://treyhunner.com/2018/10/asterisks-in-python-what-they-are-and-how-to-use-them/

In [None]:
print(*['jdoe is', 42, 'years old'])

### List comprehension

In [1]:
list_of_lists = [['4', '8'], ['4', '2', '28'], ['1', '12'], ['3', '6', '2']]

[int(i) for sublist in list_of_lists for i in sublist]

[4, 8, 4, 2, 28, 1, 12, 3, 6, 2]

In [2]:
[[int(j) for j in i] for i in list_of_lists]

[[4, 8], [4, 2, 28], [1, 12], [3, 6, 2]]

In [12]:
list_1 = [2, 6, 7, 3]
list_2 = [1, 4, 2]

list_3 = [ (x, y) for x in list_1 for y in list_2 ]

print(list_3)

[(2, 1), (2, 4), (2, 2), (6, 1), (6, 4), (6, 2), (7, 1), (7, 4), (7, 2), (3, 1), (3, 4), (3, 2)]


### Ordered intersection of lists

In [6]:
list_1 = [3,2,1,2]
list_2 = [2,3,4,2]

In [7]:
set_2 = frozenset(list_2)
intersection = [x for x in list_1 if x in set_2]
intersection

[3, 2, 2]

In [10]:
set_2

frozenset({2, 3, 4})

In [1]:
def align(a, b):
  return(a,b)

a = [1,2,3]
b = [1]
c=[1,2,3,4]
x = map(align, a,c)
for i,v in enumerate(x):
    print(v)

(1, 1)
(2, 2)
(3, 3)
