This repository contains a collection of python scripts to perform Exploratory Data Analysis as well as demonstrate
the use of Machine Learning as well as Deep Learning techniques in various predictive and classification problems.
I will be using this repository to post small snippets of code in Python that may or may not be run end-to-end with one purpose in mind.
List of Files
- "Example Scripts for doing basic data manipulation" . In this Data Manipulation script called datamanip.py , small snippets of code can be used and pasted on Jupyter notebook to see how various code blocks can be used doing different kinds of data manipulation. This cannot be run by itself. Users will need to manipulate it according to their needs for it to work on their dataset.
- "Code for counting words, lines, characters for text manipulation" . In thisscript , parts of code can be used and pasted on Jupyter notebook to see how text manipulation such a count of words, sentences etc. can be achieved when reading a text file.
- "Python Script using deep learning models on the famous digit classification problem" .This script does the job of quick demo to see how digit classification problem can be attempted using deep learning models.
- "Python Script demonstrating use of Scrapy package to scrape contents" .This script demonstrates the use of Scrapy package within python to generate a csv file. In this example, I have given an example of a working script to scrape details of a jockey racing site.
- "Scripts demonstrating use of three decision tree models on house price prediction" .This folder contains scripts that can be used and adapted towards business problems involving prediction. The scripts in this folder demonstrate use of XGboost, Decision Tree as well as Random Forest methods on price prediction.
- "Scripts demonstrating application of Machine Learning to Fraud Detection" .This folder contains scripts that can be used and adapted towards business problems involving fraud detection. The scripts in this folder will slowly be updated starting with exploratory analysis to understanding and dealing with imbalanced datasets.
- "Application of NLP and topic modeling" .This folder contains scripts that can be used and adapted towards topic modelling challenges for unstructured textual data. The scripts in this folder show a sample of a simple start-end NLP topic modelling process as applied to airline reviews. The dataset was scraped from airline review sites and unfortunately cannot be shared here.
- "Tracking folder changes using Python" .This folder contains a script that can monitor a given directory for changes. For as long as we continue to answer y at the monitor prompt, the script will track changes to the directory path input by the user. This code currently works only for one directort and all the files and subdirectories inside the directory. It tracks additions, deletions as well as modifications to the files and folders within the directory.