UdacityDAND_Project5
Machine Learning project: Detect Fraud in Enron Data. This project woks over two datasets, the ENRON financial data and the ENRON emails. These datasets are about ENRON emplyees. The idea is to build a model that will detect if a certain employee was part of the fraud or not, as pointed to by the investigations. The financial data is a numeric data, so regular machine learning techniques were used. The emails are textual data, so Natural Language Processing was used.
Files:
-
ENRON_NLTK_ML.ipynb: My work for the email text data. This includes only the final work, any work that lead to this part (like wrong or useless attempts, usage of NLTK) was not included.
-
P5_ENRON_FINANCIAL.ipynb: My work over the financial data.
-
HTML versions of both .ipynb files, for easier viewing.
-
poi_id.py: Includes only my final classifier. Any other attempts are found in the Jupyter notebook ENRON_NLTK_ML.ipynb
-
pickled files for the financial data are included as well.
-
A list of some references used during the project.