Skip to content

Mraghuvaran/10-k-Filing--Sentiment-analysis-NLP-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

10-k-Filing--Sentiment-analysis-NLP-ML- Approach-1

Problem Description:

A 10-K FInancial Report is a comprehensive report which must be filed annually by all publicly traded companies about its financial performance.

These reports are filed to the US Securities Exchange Commission (SEC). This is even more detailed than the annual report of a company.

The 10K documents contain information about the Business' operations, risk factors, selected financial data, the Management's discussion and analysis (MD&A) and also Financial Statements and supplementary data.

10-K reports are very important for investors where they describe about the company's potential to Succeed.

Business Understanding

A 10-K filing is split into 4 parts

Part 1 - Gives an overview of the business. Part 2 - Discusses the firm's financial standing and its various securities being traded in the financial markets. Part 3 - Contains disclosures about important company personnel and their families. Part 4 - Contains the financial statements and exhibits (tables) that are expected to come with the 10-K.

1. Business.

This provides an overview of the company’s main operations, including its products and services (i.e., how it makes money).

2. Management Discussion & Analysis.(MD & A).

Also known as MD&A, this gives the company an opportunity to explain its business results from the previous fiscal year. This section is where the company can tell its story in its own words.

3. Risk factors.

These outline any and all risks the company faces or may face in the future. The risks are typically listed in order of importance.

As a Data Scientist Resources given:

  1. ".txt" links of the companies, their filing year's & comppany "CIK" numbers.
  2. Dictionary file to vectorise the text data and capturing the sentiment.

OBJECTIVES:

  1. Data Extraction from the EDGAR base from the given links.

  2. Cleaning the data.

  3. Understanding the data & Getting the meaningful Insights.

  4. Building an robust model with validation strategy to make predictions on the long term outlook.

  5. Model's without Using Dictionary using tf-idf.