In the cell below, create a Python function that wraps your previous solution for the Bag of Words lab.

Requirements:

1. Your function should accept the following parameters:
    * `docs` [REQUIRED] - array of document paths.
    * `stop_words` [OPTIONAL] - array of stop words. The default value is an empty array.

1. Your function should return a Python object that contains the following:
    * `bag_of_words` - array of strings of normalized unique words in the corpus.
    * `term_freq` - array of the term-frequency vectors.

In [23]:
import pandas as pd
import numpy as np

    
 #   """
 #   Loop `corpus`. Append the terms in each doc into the `bag_of_words` array. The terms in `bag_of_words` 
 #   should be unique which means before adding each term you need to check if it's already added to the array.
 #   In addition, check if each term is in the `stop_words` array. Only append the term to `bag_of_words`
 #   if it is not a stop word.
 #   """

# Define function
# In the function, first define the variables you will use such as `corpus`, `bag_of_words`, and `term_freq`.

def get_bow_from_docs(docs, stop_words=[]):
    corpus = []
    bag_of_words = []
    term_freq = []
    corpus_clean = []

    for document in docs:
        with open(document,'r') as f:
            lines = f.readlines()[0]
            corpus.append(lines)
    
    for doc in corpus:
        corpus_clean.append(doc.lower().replace(".",""))

    for sentence in corpus_clean:
        words = sentence.split(" ")
        for w in words:
            if w in stop_words:
                continue
            if w not in bag_of_words:
                bag_of_words.append(w)


    for sentence in corpus_clean:
        storing_list = []
        terms = sentence.split()
        for word in bag_of_words:
            if word in terms:
                storing_list.append(1)
            else: 
                storing_list.append(0)

        term_freq.append(storing_list)
    

    return {
        "bag_of_words": bag_of_words,
        "term_freq": term_freq
    }

    

Test your function without stop words. You should see the output like below:

```{'bag_of_words': ['ironhack', 'is', 'cool', 'i', 'love', 'am', 'a', 'student', 'at'], 'term_freq': [[1, 1, 1, 0, 0, 0, 0, 0, 0], [1, 0, 0, 1, 1, 0, 0, 0, 0], [1, 0, 0, 1, 0, 1, 1, 1, 1]]}```

In [24]:
# Define doc paths array
docs = ['doc1-Copy1.txt', 'doc2-Copy1.txt', 'doc3-Copy1.txt']

# Obtain BoW from your function
bow = get_bow_from_docs(docs)

# Print BoW
print(bow)

{'bag_of_words': ['ironhack', 'is', 'cool', 'i', 'love', 'am', 'a', 'student', 'at'], 'term_freq': [[1, 1, 1, 0, 0, 0, 0, 0, 0], [1, 0, 0, 1, 1, 0, 0, 0, 0], [1, 0, 0, 1, 0, 1, 1, 1, 1]]}


If your attempt above is successful, nice work done!

Now test your function again with the stop words. In the previous lab we defined the stop words in a large array. In this lab, we'll import the stop words from Scikit-Learn.

In [25]:
from sklearn.feature_extraction import stop_words
print(stop_words.ENGLISH_STOP_WORDS)

frozenset({'than', 'first', 'well', 'being', 'our', 'i', 'of', 'between', 'nobody', 'its', 'empty', 'this', 'keep', 'whatever', 'further', 'thru', 'from', 'became', 'amount', 'upon', 'by', 'so', 'ours', 'can', 'to', 'elsewhere', 'her', 'down', 'for', 'she', 'inc', 'fifteen', 'couldnt', 'anyone', 'one', 'without', 'nevertheless', 'through', 'un', 'side', 'under', 'been', 'each', 'serious', 're', 'should', 'never', 'although', 'every', 'too', 'any', 'meanwhile', 'the', 'latterly', 'show', 'something', 'and', 'my', 'but', 'wherever', 'whereafter', 'ie', 'what', 'such', 'ltd', 'eight', 'here', 'which', 'hasnt', 'seem', 'sometimes', 'whereby', 'nor', 'him', 'nothing', 'somehow', 'beyond', 'give', 'until', 'get', 'put', 'latter', 'may', 'still', 'am', 'made', 'other', 'part', 'go', 'is', 'cannot', 'their', 'had', 'whereas', 'since', 'except', 'are', 'describe', 'via', 'anything', 'six', 'eg', 'could', 'twelve', 'below', 'found', 'formerly', 'we', 'however', 'forty', 'becoming', 'nowhere', 't

You should have seen a large list of words that looks like:

```frozenset({'across', 'mine', 'cannot', ...})```

`frozenset` is a type of Python object that is immutable. In this lab you can use it just like an array without conversion.

Next, test your function with supplying `stop_words.ENGLISH_STOP_WORDS` as the second parameter.

In [28]:
bow = get_bow_from_docs(docs, stop_words.ENGLISH_STOP_WORDS)

print(bow)

{'bag_of_words': ['ironhack', 'cool', 'love', 'student'], 'term_freq': [[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1]]}


You should have seen:

```{'bag_of_words': ['ironhack', 'cool', 'love', 'student'], 'term_freq': [[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1]]}```