# Assignment 1: Python Skills

## Instructions

In this assignment, you will answer some conceptual questions about Python and write a collection of basic algorithms and data structures.  You will find that in addition to a problem specification, each programming question also includes examples from the Python interpreter. These are meant to illustrate typical use cases, and should not be taken as comprehensive test suites.

### Note:
You are expected to write code where you see **your code here**.  
Make sure you delete the lines with **pass** and **raise NotImplementedError** or your code may not run correctly.

## Working with Lists

Write a function concatenate(seqs) that returns a list containing the concatenation of the elements of the input sequences. Your implementation should consist of a single list comprehension, and should not exceed one line.

In [31]:
############################################################
# Section: Working with Lists
############################################################

def concatenate(seqs):
    return [x for s in seqs for x in s]

In [32]:
##########################
### TEST YOUR SOLUTION ###
##########################

# concatenate test
concatenate_test = concatenate([[1, 2], [3, 4]])
assert concatenate_test == [1, 2, 3, 4], "Concatenated list should be [1, 2, 3, 4]"
concatenate_test = concatenate([[1, 2], [3, 4, 5]])
assert concatenate_test == [1, 2, 3, 4, 5], "Concatenated list should be [1, 2, 3, 4, 5]"
print("test passed!")


test passed!


Write a function transpose(matrix) that returns the transpose of the input matrix, which is represented as a list of lists. Recall that the transpose of a matrix is obtained by swapping its rows with its columns. More concretely, the equality matrix[i][j] == transpose(matrix)[j][i] should hold for all valid indices i and j. You may assume that the input matrix is well-formed, i.e., that each row is of equal length. You may further assume that the input matrix is non-empty. Your function should not modify the input.

In [33]:
# Can I do this using numpy or that measn modifying input?
import numpy as np

def transpose(matrix):
    #m1 = np.array(matrix)
    #print(m1.T) but that returns array so turn back to matrix!
    return(np.array(matrix).T.tolist())
    

In [34]:
##########################
### TEST YOUR SOLUTION ###
##########################

# transpose test
transpose_test1 = transpose([[1, 2, 3]])
transpose_test2 = transpose([[1, 2], [3, 4], [5, 6]])
assert transpose_test1 == [[1], [2], [3]], "Transposed list should be [[1], [2], [3]]"
assert transpose_test2 == [[1, 3, 5], [2, 4, 6]], "Transposed list should be [[1, 3, 5], [2, 4, 6]]"
print("test passed!")


test passed!


## Sequence Slicing

Write a function copy(seq) that returns a new sequence containing the same elements as the input sequence.

In [35]:
############################################################
# Section: Sequence Slicing
############################################################

def copy(seq):
    #return ''.join(x for x in seq) but doesn't work for non-string
    return (seq) # handles string and lists..

In [36]:
##########################
### TEST YOUR SOLUTION ###
##########################

# copy test
copy_test1 = copy("abc")
copy_test2 = copy((1, 2, 3))
assert copy_test1 == 'abc'
assert copy_test2 == (1, 2, 3)
print("test passed!")


test passed!


Write a function all_but_last(seq) that returns a new squence containing all but the last element of the input sequence. If the input sequence is empty, a new empty sequence of the same type should be returned.

In [37]:
def all_but_last(seq):
    return(seq[:-1])

In [38]:
##########################
### TEST YOUR SOLUTION ###
##########################

# all_but_last test
all_but_last_test1 = all_but_last("abc")
all_but_last_test2 = all_but_last((1, 2, 3))
assert all_but_last_test1 == 'ab'
assert all_but_last_test2 == (1, 2)
print("test passed!")


test passed!


Write a function every_other(seq) that returns a new sequence containing every other element of the input sequence, starting with the first. This function can be written in one line using the optional third parameter of the slice notation.

In [39]:
def every_other(seq):
    return(seq[::2]) # take every second element from the sequence starting from index 0!

In [40]:
##########################
### TEST YOUR SOLUTION ###
##########################

# every_other test
every_other_test1 = every_other([1, 2, 3, 4, 5])
every_other_test2 = every_other("abcde")
assert every_other_test1 == [1, 3, 5]
assert every_other_test2 == 'ace'
print("test passed!")


test passed!


## Combinatorial Algorithms

The prefixes of a sequence include the empty sequence, the first element, the first two elements, etc., up to and including the full sequence itself.  

Write a function prefixes(seq) that yield all prefixes of the input sequence.

In [41]:
############################################################
# Section: Combinatorial Algorithms
############################################################

def prefixes(seq):
   for i in range(len(seq)+1):
    yield seq[:i]


In [42]:
##########################
### TEST YOUR SOLUTION ###
##########################

# prefixes test
prefixes_test1 = list(prefixes([1, 2, 3]))
prefixes_test2 = list(prefixes("abc"))
assert prefixes_test1 == [[], [1], [1, 2], [1, 2, 3]]
assert prefixes_test2 == ['', 'a', 'ab', 'abc']
print("test passed!")


test passed!


Write a function slices(seq) that yields all non-empty slices of the input sequence.

In [43]:
def slices(seq):
    for i in range(len(seq)+1):
        for j in range(i):
            yield seq[j:i]


In [44]:
##########################
### TEST YOUR SOLUTION ###
##########################

# slices test
slices_test1 = list(slices([1, 2, 3]))
slices_test2 = list(slices("abc")) 
assert slices_test1 == [[1], [1, 2], [1, 2, 3], [2], [2, 3], [3]]
assert slices_test2 == ['a', 'ab', 'abc', 'b', 'bc', 'c']
print("test passed!")


AssertionError: 

## Text Processing

A common preprocessing step in many natural language processing tasks is text normalization, wherein words are converted to lowercase, extraneous whitespace is removed, etc. 

Write a function normalize(text) that returns a normalized version of the input string, in which all words have been converted to lowercase and are separated by a single space. No leading or trailing whitespace should be present in the output.

In [45]:
############################################################
# Section: Text Processing
############################################################

def normalize(text):
   return(' '.join(text.lower().split()))

In [46]:
##########################
### TEST YOUR SOLUTION ###
##########################

# normalize test
normalize_test1 = normalize("This is an example.")
normalize_test2 = normalize(" EXTRA SPACE ")
assert normalize_test1 == "this is an example."
assert normalize_test2 == "extra space"
print("test passed!")


test passed!


Write a function digits_to_words(text) that extracts all digits from the input string, spells them out as lowercase English words, and returns a new string in which they are each separated by a single space. If the input string contains no digits, then an empty string should be returned.

In [47]:
def digits_to_words(text):
   words_for_digits = {
       '0':'zero',
       '1':'one',
       '2':'two',
       '3':'three',
       '4':'four',
       '5':'five',
       '6':'six',
       '7':'seven',
       '8':'eight',
       '9':'nine'
   }
   words = []
   for w in text:
        if w.isdigit():
            words.append(words_for_digits[w])
            
   output = ' '.join(words)
   #print(output)
   return output
   

In [48]:
##########################
### TEST YOUR SOLUTION ###
##########################

# digits_to_words test
digits_to_words_test1 = digits_to_words("Zip Code: 19104")
digits_to_words_test2 = digits_to_words("Pi is 3.1415...")
assert digits_to_words_test1 == "one nine one zero four"
assert digits_to_words_test2 == "three one four one five"
print("test passed!")


test passed!


In [49]:
# your code here


## Python Packages

Install numpy first:

In [50]:
# your code here
import numpy as np


Use numpy package to implement the sort_array(list_of_matrics) which takes in a list of matrices with various dimensions and return a sorted 1D array with decreasing order contains all the values in these matrices. The data type of the returned array is int.

In [51]:
############################################################
# Section: Python Packages
############################################################
import numpy as np
def sort_array(list_of_matrices):
   m1 = np.concatenate([m.flatten() for m in list_of_matrices])
   #print(np.sort(m1)[::-1]) # for descending order sorting, reverse the order!!
   return np.sort(m1)[::-1]

In [52]:
##########################
### TEST YOUR SOLUTION ###
##########################

# sort_array test
from numpy.testing import assert_array_equal
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6, 7], [7, 8, 9], [0, -1, -2]])
sort_array_test = sort_array([matrix1, matrix2])

expected_result = np.array([9, 8, 7, 7, 6, 5, 4, 3, 2, 1, 0, -1, -2])
assert_array_equal(sort_array_test, expected_result)

print("test passed!")



test passed!


In [53]:
# your code here
pip install nltk

SyntaxError: invalid syntax (2747489415.py, line 2)

In [54]:
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('stopwords')
nltk.download('averaged_perceptron_tagger')

ModuleNotFoundError: No module named 'nltk'


Implement the POS_tag(sentence) function which takes in a string and return the Part-of-Speech(POS) tag of each word in the sentence. To complete this task, you need to fulfill the requirements shown below:
1) Convert all the characters in the sentence to lower case. 
2) Tokenize the sentence.
3) Remove the stop words and punctuation.
4) Conduct the pos tagging and return a list of tuples.


In [36]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk import pos_tag

#nltk.download('stopwords')

def POS_tag(sentence):
    lower_s = sentence.lower()  # Convert all the characters in the sentence to lower case.
    tokens  = word_tokenize(lower_s) # Tokenize the sentence
    #print(tokens)   

    #Remove the stop words and punctuation.
    stop_words = set(stopwords.words('english'))
    stop_words.update(['the','will','be','with','you.', '.'])

    #Conduct the pos tagging and return a list of tuples.
    tokens_list = [tok for tok in tokens if tok not in stop_words]    
    pos_tags = pos_tag(tokens_list)
   
    return pos_tags

In [37]:
##########################
### TEST YOUR SOLUTION ###
##########################

# POS_tag test
sentence = 'The Force will be with you. Always.'
POS_tag_test = POS_tag(sentence)
assert POS_tag_test == [('force', 'NN'), ('always', 'RB')]
print("test passed!")


LookupError: 
**********************************************************************
  Resource [93maveraged_perceptron_tagger_eng[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('averaged_perceptron_tagger_eng')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mtaggers/averaged_perceptron_tagger_eng/[0m

  Searched in:
    - '/home/jovyan/nltk_data'
    - '/opt/conda/nltk_data'
    - '/opt/conda/share/nltk_data'
    - '/opt/conda/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************
