Skip to content
Machine Learning project: Detect Fraud in Enron Data. This project woks over two datasets, the ENRON financial data and the ENRON emails. These datasets are about ENRON emplyees. The idea is to build a model that will detect if a certain employee was part of the fraud or not, as pointed to by the investigations. The financial data is a numeric d…
HTML Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
ENRON_NLTK_ML.html
ENRON_NLTK_ML.ipynb
P5_ENRON_FINANCIAL.html
P5_ENRON_FINANCIAL.ipynb
Project 5 Report.pdf
README.md
References.txt
my_classifier.pkl
my_dataset.pkl
my_feature_list.pkl
poi_id.py

README.md

UdacityDAND_Project5

Machine Learning project: Detect Fraud in Enron Data. This project woks over two datasets, the ENRON financial data and the ENRON emails. These datasets are about ENRON emplyees. The idea is to build a model that will detect if a certain employee was part of the fraud or not, as pointed to by the investigations. The financial data is a numeric data, so regular machine learning techniques were used. The emails are textual data, so Natural Language Processing was used.

Files:

  • ENRON_NLTK_ML.ipynb: My work for the email text data. This includes only the final work, any work that lead to this part (like wrong or useless attempts, usage of NLTK) was not included.

  • P5_ENRON_FINANCIAL.ipynb: My work over the financial data.

  • HTML versions of both .ipynb files, for easier viewing.

  • poi_id.py: Includes only my final classifier. Any other attempts are found in the Jupyter notebook ENRON_NLTK_ML.ipynb

  • pickled files for the financial data are included as well.

  • A list of some references used during the project.

You can’t perform that action at this time.