Skip to content
Machine Learning project: Detect Fraud in Enron Data. This project woks over two datasets, the ENRON financial data and the ENRON emails. These datasets are about ENRON emplyees. The idea is to build a model that will detect if a certain employee was part of the fraud or not, as pointed to by the investigations. The financial data is a numeric d…
HTML Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
ENRON_NLTK_ML.html
ENRON_NLTK_ML.ipynb
P5_ENRON_FINANCIAL.html
P5_ENRON_FINANCIAL.ipynb
Project 5 Report.pdf
README.md
References.txt
my_classifier.pkl
my_dataset.pkl
my_feature_list.pkl
poi_id.py

README.md

UdacityDAND_Project5

Machine Learning project: Detect Fraud in Enron Data. This project woks over two datasets, the ENRON financial data and the ENRON emails. These datasets are about ENRON emplyees. The idea is to build a model that will detect if a certain employee was part of the fraud or not, as pointed to by the investigations. The financial data is a numeric data, so regular machine learning techniques were used. The emails are textual data, so Natural Language Processing was used.

Files:

  • ENRON_NLTK_ML.ipynb: My work for the email text data. This includes only the final work, any work that lead to this part (like wrong or useless attempts, usage of NLTK) was not included.

  • P5_ENRON_FINANCIAL.ipynb: My work over the financial data.

  • HTML versions of both .ipynb files, for easier viewing.

  • poi_id.py: Includes only my final classifier. Any other attempts are found in the Jupyter notebook ENRON_NLTK_ML.ipynb

  • pickled files for the financial data are included as well.

  • A list of some references used during the project.

You can’t perform that action at this time.