# Python Basics 

## 1. Common String Operations

**Convert a string to a list and a list to a string**

In [1]:
text = "John loves apples"
print("STRING:",text)

# convert string to list
tokens = text.split(' ')
print("LIST:",tokens)

# convert list to string
text_from_list = ' '.join(tokens)
print("STRING:",text_from_list)

STRING: John loves apples
LIST: ['John', 'loves', 'apples']
STRING: John loves apples


**Splitting a string**

In [2]:
txt = "apple#banana#cherry#orange"

x = txt.split("#")

print(x)

['apple', 'banana', 'cherry', 'orange']


**Check that the string only contains characters**

In [3]:
w1 = "apple"
w1.isalpha()

True

In [4]:
w1 = "apple5"
w1.isalpha()

False

## 2. Common operations on files 

#### Duplicate all the files with extension *.txt to files with extension *.out

In [5]:
# duplicate all the files with extension *.txt to files with extension *.out

import glob

list_of_files = glob.glob('./*.txt')           # create the list of file
for file_name in list_of_files:
  FI = open(file_name, 'r')
  FO = open(file_name.replace('txt', 'out'), 'w') 
  for line in FI:
    FO.write(line)

  FI.close()
  FO.close()

**Printing out the names of the files contained in a directory**

In [7]:
import os

with os.scandir('.') as entries:
    for entry in entries:
        print(entry.name)

CS_pandas_visualisation.ipynb
.ipynb_checkpoints
CS_pandas.ipynb
eli5.json
CS_spacy.ipynb
food.csv
CS_python.ipynb
08_X_pandas_stats_viz.ipynb
CS_stanza.ipynb
wkp


**Open a file and store its input into a string**

In [None]:
# infile is a reference to a file object
with open("ameliepoulain.txt") as infile:
    file_content = infile.read()
    print("STRING:", file_content)

**Open a file and store its lines into a list**
- The **readlines** method returns the contents of the entire file as a list of strings, where each item in the list represents one line of the file.
- The **readline** method reads one line from the file and returns it as a string. The string returned by readline will contain the newline character at the end. 

In [None]:
with open("../data/amelie-with-lines.txt") as f:
        lines = f.readlines()
        for l in lines:
            print("LIST ITEM:", l)
        

In [None]:
with open("../data/amelie-with-lines.txt") as f:
        l = f.readline()
        print("LIST ITEM:", l)
        

**Writing to a file**

https://www.w3schools.com/python/python_file_write.asp

In [None]:
# write to file
f = open("demofile2.txt", "w")
f.write("Now the file has more content!")
f.close()

#open and read the file after writing
f = open("demofile2.txt", "r")
print(f.read())

## Common operations on texts

**Removing punctuation** 

In [None]:
# Define a translation table that maps each punctuation sign to the empty string
# i.e., that deletes punctuation signs
import string
translator = str.maketrans('', '', string.punctuation)

text = 'string with "punctuation" inside of it! Does this work? I hope so.'

# Apply the translation table to a string
# This deletes all punctuation signs in that string
text.translate(translator)

**Lowercasing a token**

In [None]:
token = "Amelie"
print(token.lower())

**Pretty printing**

A formatted string literal or f-string is a string literal that is prefixed with 'f' or 'F'. These strings may contain replacement fields, which are expressions delimited by curly braces {}. While other string literals always have a constant value, formatted strings are really expressions evaluated at run time.

Some examples of formatted string literals:


In [None]:
name = "Fred"
print(f"He said his name is {name}.")

## Common operations on dictionaries

#### Sorting a dictionnary by values

In [None]:
d = {"Tom":67, "Tina": 54, "Akbar": 87, "Kane": 43, "Divya":73}
# create a list sorted by values
l =sorted((value, key) for (key,value) in d.items())
# create a sorted dictionary from the sorted list
sortdict=dict([(k,v) for v,k in l])
print(sortdict)

**Creating a dictionary from a list of pairs**

In [None]:
d = dict([(1,'a'),(3,'b')])

#### Creating a dictionary from two lists

In [None]:
# create a list with student name
name = ['sravan', 'ojaswi', 'rohith', 'gnanesh', 'bobby']
 
# create a list with student age
age = [23, 21, 32, 11, 23]
 
# using dict method with zip()
dict(zip(name, age))

#### Creating a dictionary from a list of pairs

In [None]:
# create a list comprehension with student age
data = [('sravan', 23), ('ojaswi', 15),
		('rohith', 8), ('gnanesh', 4), ('bobby', 20)]


# display using iterable method
{key: value for (key, value) in data}


**Creating a dictionary using collections.defaultdic method**

In [None]:
import collections

# Set the size of the default value that will be assigned to each token to the current size of the vocabulary
token2int = collections.defaultdict(lambda: len(token2int)) 

# Create the dictionary from a list of tokens
[token2int[token] for token in ["The","woman", "put","the","book","on","the","table"]]

# Print it out
token2int.items()

**Getting the elements of a dictionary into a list**

In [None]:
d.items()

**Getting the keys of a dictionary**

In [None]:
d.keys()

**Inverting a dictionary**

In [None]:
token2int = dict([('cat',1),('dog',0)])
int2token = dict((i,t) for (t,i) in token2int.items())
print('token2int',token2int.items())
print('int2tokens',int2token.items())

**Printing out the key-value pairs of a dictionary**

In [None]:
for key, value in token2int.items():
    print( '{} : {}'.format( key, value ) )

#### Create a frequency distribution from a list
- each time a new key is found it is added to the dictionnary with value 0

In [None]:
from collections import defaultdict

s = 'mississippi'
d = defaultdict(int)
for k in s:
    d[k] += 1
d

## Common operations on tuples or lists

**Zipping tuples or lists**

The zip() function returns a zip object, which is an iterator of tuples where the first item in each passed iterator is paired together, and then the second item in each passed iterator are paired together etc.

If the passed iterators have different lengths, the iterator with the least items decides the length of the new iterator.

In [None]:
a = ("John", "Charles", "Mike")
b = ("Jenny", "Christy", "Monica")

zip(a, b)

In [None]:

[x for x in list(zip(a,b))]

In [None]:
[x for x in list(zip(a,b))[0]]

In [None]:
numbers = [1, 2, 3]
letters = ['a', 'b', 'c']
zipped = zip(numbers, letters)
zipped  # Holds an iterator object
# convert iterator to list
list(zipped)

In [None]:
data = [(1,'sravan'),(2,'ojaswi'),
        (3,'bobby'),(4,'rohith'),
        (5,'gnanesh')]
  
# get first element using zip
print(list(zip(*data))[0])

In [None]:
def foo(a, b=None, c=None):
   print(a, b, c)

foo([1, 2, 3])

In [None]:
foo(*[1, 2, 3])

**Slicing lists**

**Reverting a list**

In [None]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[::-1])


**Creating a list of integers from 0 to 9**

In [None]:
print(list(range(10)))

**Using the enumerate method to create a (position, element) pairs from a list of elements**

https://book.pythontips.com/en/latest/enumerate.html

In [None]:
l = ["a","b","c"]
{(i, t) for i, t in enumerate(l)}

### Printing

**Format method**

- The format() method formats the specified value(s) and insert them inside the string's placeholder.
- The placeholder is defined using curly brackets: {}. Read more about the placeholders in the Placeholder section below.
- The format() method returns the formatted string.

In [None]:
txt1 = "My name is {fname}, I'm {age}".format(fname = "John", age = 36)
txt2 = "My name is {0}, I'm {1}".format("John",36)
txt3 = "My name is {}, I'm {}".format("John",36)

print(txt1)
print(txt2)
print(txt3)

**Star operator**

The print of * for a text is equal as printing print(text[0], text[1], ..., text[n]) and this is printing each part with a space between
The asterisk passes all the items in list with a space between into the print functio, without us even needing to specify how many arguments are in the list.

You can read more about it here: https://treyhunner.com/2018/10/asterisks-in-python-what-they-are-and-how-to-use-them/

In [None]:
print(*['jdoe is', 42, 'years old'])

### List comprehension

In [None]:
list_of_lists = [['4', '8'], ['4', '2', '28'], ['1', '12'], ['3', '6', '2']]

[int(i) for sublist in list_of_lists for i in sublist]

In [None]:
[[int(j) for j in i] for i in list_of_lists]

In [None]:
list_1 = [2, 6, 7, 3]
list_2 = [1, 4, 2]

list_3 = [ (x, y) for x in list_1 for y in list_2 ]

print(list_3)

### Ordered intersection of lists

In [None]:
list_1 = [3,2,1,2]
list_2 = [2,3,4,2]

In [None]:
set_2 = frozenset(list_2)
intersection = [x for x in list_1 if x in set_2]
intersection

In [None]:
set_2

In [None]:
def align(a, b):
  return(a,b)

a = [1,2,3]
b = [1]
c=[1,2,3,4]
x = map(align, a,c)
for i,v in enumerate(x):
    print(v)