In the cell below, create a Python function that wraps your previous solution for the Bag of Words lab.

Requirements:

1. Your function should accept the following parameters:
    * `docs` [REQUIRED] - array of document paths.
    * `stop_words` [OPTIONAL] - array of stop words. The default value is an empty array.

1. Your function should return a Python object that contains the following:
    * `bag_of_words` - array of strings of normalized unique words in the corpus.
    * `term_freq` - array of the term-frequency vectors.

In [79]:
# Import required libraries

import re 

docs = ['Ironhack is cool!', 'I love Ironhack.','I am a student at Ironhack.']
# Define function

def get_bow_from_docs(docs, stop_words=[]):
    
 # In the function, first define the variables you will use such as `corpus`, `bag_of_words`, and `term_freq`.   
    
    #adding intermediate-step lists in order to prepare the data for "corpus"
    doc_sub1 = [re.sub('[,.!]','',doc.lower()) for doc in docs]
    doc_sub2 = [doc1.split()for doc1 in doc_sub1]
    
    corpus = [d2 for doc2 in doc_sub2 for d2 in doc2]
    
    #adding words to bag_of_words and converting the list to set(make the values unique, then back to list
    bag_of_words = list(set([word for word in corpus if word not in stop_words]))
    
    #defining term frequency 
    term_freq = []
    
    for item in doc_sub2:
        term_freq_item =[]
        for word in bag_of_words:
            frq_word = item.count(word)
            term_freq_item.append(frq_word)
        term_freq.append(term_freq_item)
        
    """
        Loop `docs` and read the content of each doc into a string in `corpus`.
        Remember to convert the doc content to lowercases and remove punctuation.
        """



    """
        Loop `corpus`. Append the terms in each doc into the `bag_of_words` array. The terms in `bag_of_words` 
        should be unique which means before adding each term you need to check if it's already added to the array.
        In addition, check if each term is in the `stop_words` array. Only append the term to `bag_of_words`
        if it is not a stop word.
        """




    """
        Loop `corpus` again. For each doc string, count the number of occurrences of each term in `bag_of_words`. 
        Create an array for each doc's term frequency and append it to `term_freq`.
    """


    
    # Now return your output as an object
    return {
        "bag_of_words": bag_of_words,
        "term_freq": term_freq
    }


{'bag_of_words': ['student', 'a', 'am', 'cool', 'love', 'is', 'i', 'at', 'ironhack'], 'term_freq': [[0, 0, 0, 1, 0, 1, 0, 0, 1], [0, 0, 0, 0, 1, 0, 1, 0, 1], [1, 1, 1, 0, 0, 0, 1, 1, 1]]}


Test your function without stop words. You should see the output like below:

```{'bag_of_words': ['ironhack', 'is', 'cool', 'i', 'love', 'am', 'a', 'student', 'at'], 'term_freq': [[1, 1, 1, 0, 0, 0, 0, 0, 0], [1, 0, 0, 1, 1, 0, 0, 0, 0], [1, 0, 0, 1, 0, 1, 1, 1, 1]]}```

In [80]:
# Define doc paths array
docs = ['Ironhack is cool!', 'I love Ironhack.','I am a student at Ironhack.']

# Obtain BoW from your function
bow = get_bow_from_docs(docs)

# Print BoW
print(bow)

{'bag_of_words': ['student', 'a', 'am', 'cool', 'love', 'is', 'i', 'at', 'ironhack'], 'term_freq': [[0, 0, 0, 1, 0, 1, 0, 0, 1], [0, 0, 0, 0, 1, 0, 1, 0, 1], [1, 1, 1, 0, 0, 0, 1, 1, 1]]}


If your attempt above is successful, nice work done!

Now test your function again with the stop words. In the previous lab we defined the stop words in a large array. In this lab, we'll import the stop words from Scikit-Learn.

In [87]:
import sys
!{sys.executable} -m pip install sklearn

from sklearn.feature_extraction import stop_words
print(stop_words.ENGLISH_STOP_WORDS)

frozenset({'hers', 'and', 'noone', 'whereupon', 'hereafter', 'up', 'six', 'being', 'how', 'rather', 'though', 'then', 'whoever', 'except', 'move', 'seeming', 'your', 'interest', 'etc', 'beside', 'must', 'anywhere', 'we', 'him', 'afterwards', 'amoungst', 'empty', 'been', 'own', 'they', 'around', 'top', 'these', 'has', 'otherwise', 'indeed', 'most', 'thru', 'me', 'find', 'is', 'due', 'twelve', 'myself', 'thick', 'former', 'who', 'whereby', 'sometime', 'latterly', 'detail', 'often', 'therein', 'cry', 'namely', 'becoming', 'seem', 'somehow', 'go', 'bill', 'beforehand', 'several', 'by', 'give', 'get', 'out', 'or', 'put', 'cant', 'become', 'so', 'a', 'forty', 'name', 'mostly', 'towards', 'fire', 'he', 'one', 'fifty', 'after', 'when', 'seems', 'nobody', 'of', 'itself', 'below', 'nothing', 'all', 'thereafter', 'as', 'during', 'whatever', 'until', 'few', 'please', 'never', 'anyhow', 'those', 'anyway', 'even', 'see', 'beyond', 'against', 'becomes', 'others', 'above', 'while', 'hasnt', 'you', 'un

You should have seen a large list of words that looks like:

```frozenset({'across', 'mine', 'cannot', ...})```

`frozenset` is a type of Python object that is immutable. In this lab you can use it just like an array without conversion.

Next, test your function with supplying `stop_words.ENGLISH_STOP_WORDS` as the second parameter.

In [90]:
bow = get_bow_from_docs(bow, stop_words.ENGLISH_STOP_WORDS)

print(bow)

# I am not sure why the parameter should be bow intead of docs... If I change it to docs, it work-ish

{'bag_of_words': ['bag_of_words', 'term_freq'], 'term_freq': [[1, 0], [0, 1]]}


You should have seen:

```{'bag_of_words': ['ironhack', 'cool', 'love', 'student'], 'term_freq': [[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1]]}```