ML project using Pyspark
In this file pyspark has been used for predicted if the flight will get delay.
To do this there are number of factors which has been taken under consideration such as miles, departure, carrier, org etc. Basic pyspark functions has been used for data exploration. There are multiple categorical varibales included in the dataset these has been handled by converting in the quantitative values. The labels are cretated on the basis of the "delay" column, if the values as greater than 20 categorized as 1 else categorized as 0. The models used are "Decision Tree" and "Logistic Regression".
The performance of the model is not as good but this depicts the basic usage of pyspark for machine learning.