# Data Upload
The purpose of this notebook is to use the python library edgar to request from the SEC.gov website the files on companies. I want to focus on the quaterly reports of companies listed as NYSE stocks; there are 2,800 companies in that list. Once I have the files will then clean the documents up the prepare them for the modeling phase. The plan is to use 19 companies.

**About the data:** The data was obtained from a list of companies that were registered with the SEC. In a scrap notebook I created a dataframe out of that list and cleaned it up. I dropped three columns which lead to the loss of 30,000 companies. This which was fine since the list contained 759,377 companies. 

Once I had the cleaned dataframe, I then searched for NYSE Companies within the data and created a new dataset. That smaller dataset is what you see being uploaded to this notebook. A link for the original list of SEC Registered companies will be provided in the resource section below. 

## Objectives:

- Download needed documentation from SEC using edgar
- Filter documentation to only contain NYSE company files
- Clean and prepocess files for modeling phase

# Bringing In the Data With Small Adjustments
In the section you will see the data frame created for 18 NYSE Companies of my choice, with their corresponding CIK number. You will then see the creation of an additional column that will have the companies corresponding ticker symbol. 

In [1]:
# Data Importing and Manipulation
import numpy as np
import pandas as pd

# Web Scrapping
from sec_edgar_downloader import Downloader
from bs4 import BeautifulSoup

# Sting Manipulation
import re
import unicodedata

In [7]:
# Importing the data
df = pd.read_csv('SEC Registered NYSE US Companies Down Sampled', index_col='Name')

In [9]:
df.columns # Checking the columns

Index(['Unnamed: 0', 'CIK_Num'], dtype='object')

In [11]:
df.drop(columns='Unnamed: 0', inplace=True) # Getting rid of extra column

In [13]:
# Creating a new with the companies ticker symbol
df['Ticker'] = ['MMM', 'ABT', 'ACN', 'ALGT',
                'BKR', 'BAX', 'BA', 'DVN',
                'GS', 'GS', 'JNJ', 'MA',
                'MS', 'PM', 'PRU', 'RTX',
                'SPG', 'SKYW', 'SO']

In [22]:
# Checking work
print('Column Names:')
print(df.columns)
print('\n')
print('Data Quick View:')
df.head()

Column Names:
Index(['CIK_Num', 'Ticker'], dtype='object')


Data Quick View:


Unnamed: 0_level_0,CIK_Num,Ticker
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
3M Co,66740,MMM
Abbott Laboratories,1800,ABT
Accenture Plc,1467373,ACN
Allegiant Travel Co,1362468,ALGT
Baker Hughes Co,1701605,BKR


# Company Document Downloads
This section will contain the downloading of the documents that will be used for sentiment analysis. The document of focus is the 10-Q or quaterly reports of each company in the dataset between the years of 2016 - 2020. This section of code was obtained from Bryan Arnold. The documentation for the code used will be located in the resource section of the notebook.

In [23]:
# Initialize a downloader instance.
# If no argument is passed to the constructor, the package
# will attempt to locate the user's downloads folder.
# I gave it the absolute path to my project folder

dl = Downloader("/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone")

**3M Co: MMM**

This is the first attempt at the obtaining the data needed from the downloaded documents. Below you will see one this condicted on one of the 10-Qs for 3M Co that was downloaded. If this proves to be sucuessful, then an automated way will be the next step to obtain the remaning data from each downloaded document.  

In [33]:
# # Get all 10-Q filings (ticker: MMM )
# dl.get("10-Q", "MMM", after_date="20160101")

13

In [58]:
# Reading in the document
PATH = "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-16-005213.txt"
file = open(PATH, "r", -1 , "utf-8")
text = file.read()
file.close()

In [59]:
soup = BeautifulSoup(text, 'lxml') # Parsing document

In [60]:
# Viewing header of document 
sec_header_tag = soup.find('sec-header')
display(sec_header_tag)

<sec-header>0001558370-16-005213.hdr.sgml : 20160503
<acceptance-datetime>20160503165847
ACCESSION NUMBER:		0001558370-16-005213
CONFORMED SUBMISSION TYPE:	10-Q
PUBLIC DOCUMENT COUNT:		92
CONFORMED PERIOD OF REPORT:	20160331
FILED AS OF DATE:		20160503
DATE AS OF CHANGE:		20160503

FILER:

	COMPANY DATA:	
		COMPANY CONFORMED NAME:			3M CO
		CENTRAL INDEX KEY:			0000066740
		STANDARD INDUSTRIAL CLASSIFICATION:	SURGICAL &amp; MEDICAL INSTRUMENTS &amp; APPARATUS [3841]
		IRS NUMBER:				410417775
		STATE OF INCORPORATION:			DE
		FISCAL YEAR END:			1231

	FILING VALUES:
		FORM TYPE:		10-Q
		SEC ACT:		1934 Act
		SEC FILE NUMBER:	001-03285
		FILM NUMBER:		161616219

	BUSINESS ADDRESS:	
		STREET 1:		3M CENTER
		STREET 2:		BLDG. 220-11W-02
		CITY:			ST PAUL
		STATE:			MN
		ZIP:			55144-1000
		BUSINESS PHONE:		6517332204

	MAIL ADDRESS:	
		STREET 1:		3M CENTER
		STREET 2:		BLDG. 220-11W-02
		CITY:			ST. PAUL
		STATE:			MN
		ZIP:			55144-1000

	FORMER COMPANY:	
		FORMER CONFORMED NAME:	MINNESOTA 

### CLEANING OF THE MMM IMPORTED DOCUMENT

In [62]:
# Remove all script and style elements
for script in soup(["script", "style"]):
    script.extract()

In [63]:
# Assign what's left to a string
pageText = soup.body.get_text()

In [94]:
pageText = unicodedata.normalize("NFKD", pageText) # Normalizing text format

In [65]:
# Getting rid of characters in document
pageText = "".join(c for c in pageText if c not in '!"#$%&\'()*+,./:;<=>?@[\\]^_`{|}~') 

### RISK FACTOR SECTION OF DOCUMENT

In [72]:
# Trying to print the text of the Risk Factor section
start = 'ITEM 1A'
end = 'ITEM 2'
result = re.search('%s(.*)%s' % (start, end), pageText)

print(result)

None


# Quick Observation 
As you can see from the cell above, there are non sections in the text labeled "Item 1A" or "Item 2". My assumption is that each company files their documents in a different style or order. So trying to pull out a certain section will be a long winded task; something I will have to try when there isn't a deadline to meet. 

## Next Steps
Being that we now have the data we require for our sentiment analysis, now we can move forward the Natural Language Processing steps. Below you will see the code for downloading each companies 10-Qs between 2016-2020. 

# Natural Languge Processing Steps
Since we are working with multiple documents for each company, we will be using TF-IDF. The first thing we will do is create a list of the document paths for each company. The nest thing we will need to do is clean up the documents; removing stops words & punctuations, turning all uppercase letters to lower case. From there a data frame will be created per document. These df are what we will be using for our sentiment analysis.   

**3M Co MMM**

In [75]:
MMM_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-16-005213.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-16-007105.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-16-008937.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-17-003391.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-17-005582.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-17-007801.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-18-004248.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-18-005773.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-18-007892.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-19-003408.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-19-006397.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-19-009143.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MMM/10-Q/0001558370-20-004478.txt"
]

**Abbott Laboratories: ABT**

In [34]:
# # Get all 10-Q filings (ticker: ABT )
# dl.get("10-Q", "ABT", after_date="20160101")

13

In [76]:
ABT_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ABT/10-Q/0001104659-16-117710.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ABT/10-Q/0001104659-16-136572.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ABT/10-Q/0001104659-16-154583.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ABT/10-Q/0001104659-17-029430.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ABT/10-Q/0001104659-17-048855.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ABT/10-Q/0001104659-17-065654.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ABT/10-Q/0001104659-18-029831.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ABT/10-Q/0001104659-18-048736.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ABT/10-Q/0001104659-18-065076.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ABT/10-Q/0001104659-19-026018.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ABT/10-Q/0001104659-19-058323.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ABT/10-Q/0001104659-20-053384.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ABT/10-Q/0001410578-19-000606.txt"
]

**Accenture Plc: ACN**

In [41]:
# # Get all 10-Q filings (ticker: ACN )
# dl.get("10-Q", "ACN", after_date="20160101")

13

In [77]:
ACN_PATH = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ACN/10-Q/0001467373-16-000774.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ACN/10-Q/0001467373-16-000915.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ACN/10-Q/0001467373-16-001180.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ACN/10-Q/0001467373-17-000154.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ACN/10-Q/0001467373-17-000277.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ACN/10-Q/0001467373-17-000506.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ACN/10-Q/0001467373-18-000142.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ACN/10-Q/0001467373-18-000228.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ACN/10-Q/0001467373-18-000424.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ACN/10-Q/0001467373-19-000130.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ACN/10-Q/0001467373-19-000217.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ACN/10-Q/0001467373-19-000395.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ACN/10-Q/0001467373-20-000158.txt"
]

**Allegiant Travel Co: ALGT**

In [36]:
# # Get all 10-Q filings (ticker: ALGT )
# dl.get("10-Q", "ALGT", after_date="20160101")

12

In [78]:
ALGT_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ALGT/10-Q/0001362468-16-000060.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ALGT/10-Q/0001362468-16-000074.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ALGT/10-Q/0001362468-16-000096.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ALGT/10-Q/0001362468-17-000014.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ALGT/10-Q/0001362468-17-000033.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ALGT/10-Q/0001362468-17-000053.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ALGT/10-Q/0001362468-18-000016.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ALGT/10-Q/0001362468-18-000033.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ALGT/10-Q/0001362468-18-000047.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ALGT/10-Q/0001362468-19-000023.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ALGT/10-Q/0001362468-19-000036.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/ALGT/10-Q/0001362468-19-000047.txt"
]

**Baker Hughes Co: BKR**

In [45]:
# # Get all 10-Q filings (ticker: BKR )
# dl.get("10-Q", "BKR", after_date="20160101")

9

In [79]:
BKR_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BKR/10-Q/0001701605-17-000066.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BKR/10-Q/0001701605-17-000100.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BKR/10-Q/0001701605-18-000052.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BKR/10-Q/0001701605-18-000082.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BKR/10-Q/0001701605-18-000103.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BKR/10-Q/0001701605-19-000038.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BKR/10-Q/0001701605-19-000058.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BKR/10-Q/0001701605-19-000089.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BKR/10-Q/0001701605-20-000039.txt"
]

**Baxter International Inc: BAX**

In [46]:
# # Get all 10-Q filings (ticker: BAX )
# dl.get("10-Q", "BAX", after_date="20160101")

13

In [80]:
BAX_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BAX/10-Q/0001193125-16-582011.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BAX/10-Q/0001193125-16-671951.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BAX/10-Q/0001193125-16-761818.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BAX/10-Q/0001564590-17-008816.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BAX/10-Q/0001564590-17-016031.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BAX/10-Q/0001564590-17-021241.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BAX/10-Q/0001564590-18-012452.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BAX/10-Q/0001564590-18-019429.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BAX/10-Q/0001564590-18-026904.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BAX/10-Q/0001564590-19-016660.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BAX/10-Q/0001564590-19-027020.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BAX/10-Q/0001628280-20-003692.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BAX/10-Q/0001628280-20-005903.txt"
]

**Boeing Co: BA**

In [44]:
# # Get all 10-Q filings (ticker: BA )
# dl.get("10-Q", "BA", after_date="20160101")

13

In [81]:
BA_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BA/10-Q/0000012927-16-000113.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BA/10-Q/0000012927-16-000143.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BA/10-Q/0000012927-16-000153.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BA/10-Q/0000012927-17-000024.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BA/10-Q/0000012927-17-000044.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BA/10-Q/0000012927-17-000073.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BA/10-Q/0000012927-18-000018.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BA/10-Q/0000012927-18-000048.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BA/10-Q/0000012927-18-000065.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BA/10-Q/0000012927-19-000030.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BA/10-Q/0000012927-19-000063.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BA/10-Q/0000012927-19-000077.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/BA/10-Q/0000012927-20-000045.txt"
]

**Devon Energy Corp: DVN**

In [43]:
# # Get all 10-Q filings (ticker: DVN )
# dl.get("10-Q", "DVN", after_date="20160101")

13

In [82]:
DVN_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/DVN/10-Q/0001193125-16-577021.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/DVN/10-Q/0001564590-16-022255.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/DVN/10-Q/0001564590-16-026889.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/DVN/10-Q/0001564590-17-008268.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/DVN/10-Q/0001564590-17-014800.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/DVN/10-Q/0001564590-17-020684.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/DVN/10-Q/0001564590-18-010177.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/DVN/10-Q/0001564590-18-018317.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/DVN/10-Q/0001564590-18-027813.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/DVN/10-Q/0001564590-19-014637.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/DVN/10-Q/0001564590-19-029786.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/DVN/10-Q/0001564590-19-040774.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/DVN/10-Q/0001564590-20-021639.txt"
]

**Goldman Sachs Group Inc: GS**

In [42]:
# # Get all 10-Q filings (ticker: GS )
# dl.get("10-Q", "GS", after_date="20160101")

13

In [83]:
GS_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/GS/10-Q/0001193125-16-580382.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/GS/10-Q/0001193125-16-670394.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/GS/10-Q/0001193125-16-757421.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/GS/10-Q/0001193125-17-156659.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/GS/10-Q/0001193125-17-247803.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/GS/10-Q/0001193125-17-331475.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/GS/10-Q/0001193125-18-151188.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/GS/10-Q/0001193125-18-236920.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/GS/10-Q/0001193125-18-317325.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/GS/10-Q/0001193125-19-137184.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/GS/10-Q/0001193125-19-212578.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/GS/10-Q/0001193125-19-280752.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/GS/10-Q/0001193125-20-129324.txt"
]

**Johnson & Johnson: JNJ**

In [47]:
# # Get all 10-Q filings (ticker: JNJ )
# dl.get("10-Q", "JNJ", after_date="20160101")

13

In [84]:
JNJ_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/JNJ/10-Q/0000200406-16-000084.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/JNJ/10-Q/0000200406-16-000105.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/JNJ/10-Q/0000200406-16-000112.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/JNJ/10-Q/0000200406-17-000024.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/JNJ/10-Q/0000200406-17-000042.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/JNJ/10-Q/0000200406-17-000052.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/JNJ/10-Q/0000200406-18-000019.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/JNJ/10-Q/0000200406-18-000041.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/JNJ/10-Q/0000200406-18-000055.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/JNJ/10-Q/0000200406-19-000033.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/JNJ/10-Q/0000200406-19-000053.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/JNJ/10-Q/0000200406-19-000066.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/JNJ/10-Q/0000200406-20-000035.txt"
]

**Mastercard Inc: MA**

In [48]:
# # Get all 10-Q filings (ticker: MA )
# dl.get("10-Q", "MA", after_date="20160101")

13

In [85]:
MA_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MA/10-Q/0001141391-16-000135.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MA/10-Q/0001141391-16-000207.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MA/10-Q/0001141391-16-000235.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MA/10-Q/0001141391-17-000063.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MA/10-Q/0001141391-17-000125.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MA/10-Q/0001141391-17-000152.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MA/10-Q/0001141391-18-000056.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MA/10-Q/0001141391-18-000111.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MA/10-Q/0001141391-18-000127.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MA/10-Q/0001141391-19-000076.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MA/10-Q/0001141391-19-000128.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MA/10-Q/0001141391-19-000150.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MA/10-Q/0001141391-20-000091.txt"
]

**Morgan Stanley: MS**

In [49]:
# # Get all 10-Q filings (ticker: MS )
# dl.get("10-Q", "MS", after_date="20160101")

13

In [86]:
MS_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MS/10-Q/0000895421-20-000323.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MS/10-Q/0001193125-16-577875.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MS/10-Q/0001193125-16-670066.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MS/10-Q/0001193125-16-757027.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MS/10-Q/0001193125-17-158668.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MS/10-Q/0001193125-17-247402.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MS/10-Q/0001193125-17-331902.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MS/10-Q/0001193125-18-152899.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MS/10-Q/0001193125-18-238269.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MS/10-Q/0001193125-18-318377.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MS/10-Q/0001193125-19-136929.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MS/10-Q/0001628280-19-009973.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/MS/10-Q/0001628280-19-013331.txt"
]

**Philip Morris International Inc.: PM**

In [50]:
# # Get all 10-Q filings (ticker: PM )
# dl.get("10-Q", "PM", after_date="20160101")

13

In [87]:
PM_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PM/10-Q/0001413329-16-000082.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PM/10-Q/0001413329-16-000096.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PM/10-Q/0001413329-16-000105.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PM/10-Q/0001413329-17-000023.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PM/10-Q/0001413329-17-000043.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PM/10-Q/0001413329-17-000054.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PM/10-Q/0001413329-18-000023.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PM/10-Q/0001413329-18-000039.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PM/10-Q/0001413329-18-000053.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PM/10-Q/0001413329-19-000034.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PM/10-Q/0001413329-19-000055.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PM/10-Q/0001413329-19-000071.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PM/10-Q/0001413329-20-000032.txt"
]

**Prudential Financial Inc: PRU**

In [51]:
# # Get all 10-Q filings (ticker: PRU )
# dl.get("10-Q", "PRU", after_date="20160101")

13

In [88]:
PRU_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PRU/10-Q/0001137774-16-000243.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PRU/10-Q/0001137774-16-000256.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PRU/10-Q/0001137774-16-000266.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PRU/10-Q/0001137774-17-000092.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PRU/10-Q/0001137774-17-000104.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PRU/10-Q/0001137774-17-000119.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PRU/10-Q/0001137774-18-000075.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PRU/10-Q/0001137774-18-000082.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PRU/10-Q/0001137774-18-000099.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PRU/10-Q/0001137774-19-000070.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PRU/10-Q/0001137774-19-000086.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PRU/10-Q/0001137774-19-000099.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/PRU/10-Q/0001137774-20-000084.txt"
]

**Raytheon Technologies Corp: RTX**

In [52]:
# # Get all 10-Q filings (ticker: RTX )
# dl.get("10-Q", "RTX", after_date="20160101")

13

In [89]:
RTX_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/RTX/10-Q/0000101829-16-000062.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/RTX/10-Q/0000101829-16-000080.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/RTX/10-Q/0000101829-16-000091.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/RTX/10-Q/0000101829-17-000016.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/RTX/10-Q/0000101829-17-000034.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/RTX/10-Q/0000101829-17-000044.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/RTX/10-Q/0000101829-18-000009.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/RTX/10-Q/0000101829-18-000027.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/RTX/10-Q/0000101829-18-000043.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/RTX/10-Q/0000101829-19-000016.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/RTX/10-Q/0000101829-19-000038.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/RTX/10-Q/0000101829-19-000047.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/RTX/10-Q/0000101829-20-000034.txt"
]

**Simon Property Group Inc: SPG**

In [53]:
# # Get all 10-Q filings (ticker: SPG)
# dl.get("10-Q", "SPG", after_date="20160101")

13

In [90]:
SPG_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SPG/10-Q/0001047469-16-012892.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SPG/10-Q/0001047469-16-014660.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SPG/10-Q/0001558370-16-009130.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SPG/10-Q/0001558370-17-003316.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SPG/10-Q/0001558370-17-005905.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SPG/10-Q/0001558370-17-007678.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SPG/10-Q/0001558370-18-003817.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SPG/10-Q/0001558370-18-006144.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SPG/10-Q/0001558370-18-008301.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SPG/10-Q/0001558370-19-004352.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SPG/10-Q/0001558370-19-007355.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SPG/10-Q/0001558370-19-010077.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SPG/10-Q/0001558370-20-006251.txt"
]

**Skywest Inc: SKYW**

In [54]:
# # Get all 10-Q filings (ticker: SKYW )
# dl.get("10-Q", "SKYW", after_date="20160101")

13

In [91]:
SKYW_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SKYW/10-Q/0001558370-16-005505.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SKYW/10-Q/0001558370-16-007497.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SKYW/10-Q/0001558370-16-009304.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SKYW/10-Q/0001558370-17-003792.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SKYW/10-Q/0001558370-17-005979.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SKYW/10-Q/0001558370-17-008114.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SKYW/10-Q/0001558370-18-004155.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SKYW/10-Q/0001558370-18-006579.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SKYW/10-Q/0001558370-18-008623.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SKYW/10-Q/0001558370-19-004074.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SKYW/10-Q/0001558370-19-007069.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SKYW/10-Q/0001558370-19-009963.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SKYW/10-Q/0001558370-20-005996.txt"
]

**Southern Co: SO**

In [55]:
# Get all 10-Q filings (ticker: SO )
# dl.get("10-Q", "SO", after_date="20160101")

13

In [92]:
SO_Path = [
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SO/10-Q/0000092122-16-000144.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SO/10-Q/0000092122-16-000179.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SO/10-Q/0000092122-16-000213.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SO/10-Q/0000092122-17-000024.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SO/10-Q/0000092122-17-000065.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SO/10-Q/0000092122-17-000076.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SO/10-Q/0000092122-18-000027.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SO/10-Q/0000092122-18-000050.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SO/10-Q/0000092122-18-000062.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SO/10-Q/0000092122-19-000016.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SO/10-Q/0000092122-19-000037.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SO/10-Q/0000092122-19-000053.txt"
    "/Users/boimoriba/Documents/Learn.Co_Docs/Projects/Capstone/sec_edgar_filings/SO/10-Q/0000092122-20-000042.txt"
]

# Stop Words & Punctuation Removal
The plan is to write a function that can remove the stop words and puctuations of each docment per company.  

# Removing Stop Words

In [None]:
# # Reading in the document
# PATH = 
# file = open(PATH, "r", -1 , "utf-8")
# text = file.read()
# file.close()

In [None]:
# soup = BeautifulSoup(text, 'lxml') # Parsing document

In [None]:
# # Remove all script and style elements
# for script in soup(["script", "style"]):
#     script.extract()

In [None]:
# # Assign what's left to a string
# pageText = soup.body.get_text()

In [None]:
# # Getting rid of characters in document
# pageText = "".join(c for c in pageText if c not in '!"#$%&\'()*+,./:;<=>?@[\\]^_`{|}~') 

# Learn.Co Stuff

In [None]:
'''Code from Learn. Co that I want to turn into a function to go through each document for the list 
of companies and clean it up but removing puntuations, stopwords, and uppercase letters''' 

# from nltk.corpus import stopwords
# import string

# # Get all the stop words in the English language
# stopwords_list = stopwords.words('english')

# # It is generally a good idea to also remove punctuation

# # Now we have a list that includes all english stopwords, as well as all punctuation
# stopwords_list += list(string.punctuation)


# from nltk import word_tokenize

# tokens = word_tokenize(some_text_data)

# # It is usually a good idea to lowercase all tokens during this step, as well
# stopped_tokens = [w.lower() for w in tokens if w not in stopwords_list]

In [None]:
# # code from learn co to ocheck frequecy 
# from  nltk import FreqDist
# freqdist = FreqDist(tokens)

# # get the 200 most common words 
# most_common = freqdist.most_common(200)

# Lemmatization

In [None]:
# from nltk.stem.wordnet import WordNetLemmatizer

# lemmatizer = WordNetLemmatizer()

# lemmatizer.lemmatize('feet') # foot
# lemmatizer.lemmatize('running') # run

# Resources 

**The Idea**

The Blog That Led to This Project: https://towardsdatascience.com/useful-sentiment-analysis-mining-sec-filings-part-1-358942fc98ed

**Newer Library Used**

Downloading 10-Q: https://sec-edgar-downloader.readthedocs.io/en/latest/

Edgar Documentation: https://pypi.org/project/edgar/

**Companies** 

List of SEC Registered Companies: https://www.sec.gov/Archives/edgar/cik-lookup-data.txt




# Human Resources

Bryan Arnold 02172020 DS Lead Instructor: https://www.kaggle.com/puremath86

Lindsey Berlin 02172020 DS Coach: https://github.com/lindseyberlin