This project is being developed by me as a scholar of the LearnIT Girl 2019 Edition.
The Enron dataset is a trove of information regarding the Enron Corporation, an energy, commodities, and services company that infamously went bankrupt in December 2001 as a result of fraudulent business practices. In the aftermath of the company’s collapse, the FERC released more 1.6 million emails sent and received by Enron executives of which about 0.5 million remain public. The emails and financial data contain the emails themselves, metadata about the emails such as number received by and sent from each individual, and financial information including salary and stock options.
The aim is to develop a Machine Learning Model that can identify the persons of interests (POIs) from the features within the data. The POIs are the individuals who were eventually tried for fraud or criminal activity in the Enron investigation. This would involve studying and cleaning the dataset, engineering the features, picking and tuning an algorithm, evaluating, and testing the identifier using an available list of actual POIs in the fraud case. The text within the emails and the financial information would act as input for the model. The project uses Python libraries like - Scikit-Learn, NumPy, Matplotlib, Pandas, etc.
The ultimate objective of investigating the Enron dataset is to be able to predict cases of fraud or unsafe business practices in general, and far in advance using Machine Learning.