Skip to content

DiptoChakrabarty/Data-Science-with-Ml

Repository files navigation

Data-Science-with-Ml

Data Science and Machine Learning

PLATFORM USED : DATABRICKS

LIBRARIES USED :

1)Pandas

•Pandas was used to read and organize the data since data representation by pandas is suitable for data analysis.

•Pandas makes it easier to represent the data and helps in performing operations on individual columns when required to filter the data.

•To manage memory pandas helps ease the process to convert a column from one data type to another and also helps in filling up empty parts of the data .

•Coding in pandas is similar to python and helps in learning and implementing things quickly.

2)Spark

•Spark supports various languages like python , scala, java , R and sql.

•Spark supports lots of different

•Spark is supported on lots of platforms and operations performed on Apache Spark are very fast compared to mapreduce.

•The Ml lib library of Spark helps in using various machine learning models like Linear Regression, Logistic Regression, Decision trees and K means Clustering.

3)Mat plotlib

•The matplot lib library was used to represent data in pictorial form.

•Data can be represented in histograms , bar graphs or pi charts etc using this library.

•It is easier to use with the pandas library.

About

Data Science and Ml in jupyter notebook

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published