It is a Data Science repository to learn and solve projects and problems. Kaggle Problems will be included here. A curated list of Python resources and programs would also be included. This is a space to keep the data and source code of the book contents up-to-date after writing data manipulation with Python. Because of the rapid flow of IT, you often encounter the following situations after writing a book; the internet site you want to analyze has changed. The latest version of the module has a syntax change. So, this space will not simply disclose the data and source code covered by the book, but more actively keep the source code open for readers. However, if the site you want to analyze disappears or exceptions are made when the module no longer supports version upgrades, etc.
What is Data Science?
Data Science is one of the hottest topics on the Computer and Internet farmland nowadays. People have gathered data from applications and systems until today and now is the time to analyze them. The next steps are producing suggestions from the data and predicting the future.
Most of the new comers have doubt on one thing that is:
Data Science | Data Analyst |
---|---|
Perform ad-hoc data mining and gather large sets of structured and unstructured data from several sources. | Gather data from various databases and warehouses, filter and clean it. |
Use various statistical methods, data visualization techniques to design and evaluate advanced statistical models from vast volumes of data. | Write complex SQL queries and scripts to collect, store, manipulate, and retrieve data from RDBMS such as MS SQL Server, Oracle DB, and MySQL. |
Automate tedious tasks and generate insights using machine learning models. | Spot trends and patterns from complex datasets. |
Build AI models using various algorithms and in-built libraries. | Create different reports with the help of charts and graphs using Excel and BI tools. |
- Excel Learning
- Statastics
- Linear Algebra
- Calculus
- Python
- SQL
- Power BI
- Tableau
- EDA
- Cloud (AWS/Azure)
- Deep Learning
This is the way i approached my journey. if you feel and want to follow any other path, its upto you.
- pandas - Data structures built on top of numpy.
- scikit-learn - Core ML library.
- matplotlib - Plotting library.
- seaborn - Data visualization library based on matplotlib.
- datatile - Basic statistics using
DataFrameSummary(df).summary()
. - pandas_profiling - Descriptive statistics using
ProfileReport
. - sklearn_pandas - Helpful
DataFrameMapper
class. - missingno - Missing data visualization.
- rainbow-csv - Plugin to display .csv files with nice colors.
- General Jupyter Tricks
- Fixing environment: link
- Python debugger (pdb) - blog post
- [video](https://www.youtube.com/watch?- v=Z0ssNAbe81M&t=1h44m15s)
- cheatsheet
- cookiecutter-data-science - Project template for data science projects.
- nteract - Open Jupyter Notebooks with doubleclick.
- papermill - Parameterize and execute Jupyter notebooks, tutorial.
- nbdime - Diff two notebook files, Alternative GitHub App: ReviewNB.
- RISE - Turn Jupyter notebooks into presentations.
- qgrid - Pandas
DataFrame
sorting. - pivottablejs - Drag n drop Pivot Tables and Charts for jupyter notebooks.
- itables - Interactive tables in Jupyter.
- jupyter-datatables - Interactive tables in Jupyter.
- debugger - Visual debugger for Jupyter.
- nbcommands - View and search notebooks from terminal.
- handcalcs - More convenient way of writing mathematical equations in Jupyter.
- notebooker - Productionize and schedule Jupyter Notebooks.
- bamboolib - Intuitive GUI for tables.
- voila - Turn Jupyter notebooks into standalone web applications.
- voila-gridstack - Voila grid layout.
- 1000 Data Science Projects you can run on the browser with ipyton.
- #tidytuesday A weekly data project aimed at the R ecosystem.
- Data science your way
- PySpark Cheatsheet
- Machine Learning, Data Science and Deep Learning with Python
- How To Label Data
- Your Guide to Latent Dirichlet Allocation
- Over 1000 Data Science Online Courses at Classpert Online Search Engine
- Tutorials of source code from the book Genetic Algorithms with Python by Clinton Sheppard
- Tutorials to get started on signal processings for machine learning
- Realtime deployment Tutorial on Python time-series model deployment.
- Python for Data Science: A Beginner’s Guide
- Minimum Viable Study Plan for Machine Learning Interviews
- Understand and Know Machine Learning Engineering by Building Solid Projects
- Matplotlib.
- Netron.
- plot.ly.
- raw.
- Seaborn.
- Wrangler.
- TensorWatch.
- Data Science IPython Notebooks
- Awesome Python - Data Analysis
- Statistics
- An Introduction to Scientific Python (and a Bit of the Maths Behind It) – NumPy
- Data Analysis and IPython Notebooks
- Python for Data Science: Basic Concepts
- Pycon India 2015 Notes
- 5 important Python Data Science advancements of 2015
- Data Exploration with Numpy cheat sheet
- Querying Craiglist with Python
- An introduction to Numpy and Scipy
- Create NBA Shot Charts
- PythoR- Python meets R
- How do I learn data analysis with Python?
- What are some interesting things to do with Python?
- Which is better for data analysis: R or Python?
- Web scraping in Python
- The Guide to Learning Python for Data Science
- Python For Data Science - A Cheat Sheet For Beginners
- Top voted Python data science questions
- Awesome Python - Data Visualization
- Awesome Python - Map Reduce
- Intro to pandas data structures
- Useful Pandas Cheatsheet
- An Introduction to Scientific Python – Pandas
- 10 minutes to Pandas
- Useful Pandas Snippets
- Timeseries analysis using Pandas
- Pandas Exercises - Practice your Pandas skills
- Grouping in Pandas
- “Large data” work flows using pandas
- Easier data analysis with pandas (video series)
- Pandas Basics Cheat Sheet
- Quick Operations on a Pandas DataFrame
- Renaming Columns in Pandas (video)
- Deleting Columns from pandas DataFrame (video)
- Adding new Column to existing DataFrame
- Add one Row in a pandas.DataFrame
- Changing the order of DataFrame Columns
- Changing data type of Columns (video)
- Getting a list of the column headers from a DataFrame
- Converting list of dictionaries to Dataframe
- Getting row count of pandas DataFrame
- Most efficient way to loop through DataFrames
- Deleting DataFrame row based on column value
- Dropping a list of rows from Pandas DataFrame
- Sorting a DataFrame or a single column
- Filtering DataFrame rows by column value
- Filtering DataFrame rows using multiple criteria
- Dropping all non-numeric columns from a DataFrame
- Counting and removing missing values
- Selecting multiple rows and columns from a DataFrame
- Reducing the size of a DataFrame
-
Fork this Repository using the button at the top. However, if you are interested in having contributions to this repo count toward Data Science community, Please give it a star and change the required code or upload any new files.
-
Clone your forked repository to your pc (
$ git clone "url from clone option of this repo"
) -
Create a new branch for your modifications (ie. git branch new-user and check it out git checkout new-user and
git checkout -b new-user
) -
Add your profile image in static/images/ ( use drag and drop option or upload by commands.)
-
Add your profile data in Contributor folder
-
Add your files (
git add -A
), commit (git commit -m "added myself"
) and push (git push origin new-user
) -
Create a pull request
-
Star this repository
-
Follow me
- Please dont use any foul language for anyone.
- Don't push same programs again and again
- We have a Discord server! This should be your first stop to talk with other learners. Why don't you introduce yourself right now?
- Discord link
- You can also interact through GitHub issues.
🔶 𝐊𝐚𝐠𝐠𝐥𝐞 : https://www.kaggle.com/
🔶 𝐆𝐢𝐭𝐡𝐮𝐛 : https://www.github.com/
🔶 𝐖𝐨𝐫𝐥𝐝 𝐃𝐚𝐭𝐚 : https://lnkd.in/ggkvXru7
🔶 𝐆𝐨𝐯. 𝐃𝐚𝐭𝐚 : https://catalog.data.gov/
🔶 𝐕𝐢𝐬𝐮𝐚𝐥𝐃𝐚𝐭𝐚 : https://visualdata.io/
🔶 𝐆𝐨𝐨𝐠𝐥𝐞 𝐂𝐥𝐨𝐮𝐝 𝐏𝐮𝐛𝐥𝐢𝐜 𝐃𝐚𝐭𝐚𝐬𝐞𝐭 : https://lnkd.in/gP5K63cG
🔶 𝐆𝐨𝐨𝐠𝐥𝐞 𝐃𝐚𝐭𝐚𝐬𝐞𝐭 𝐒𝐞𝐚𝐫𝐜𝐡 : https://lnkd.in/gd39KBVQ
🔶 𝐑𝐞𝐝𝐝𝐢𝐭 : https://lnkd.in/gfpfpGMF
🔶 𝐐𝐮𝐚𝐧𝐝𝐥 𝐃𝐚𝐭𝐚𝐬𝐞𝐭 : https://lnkd.in/guaQz6rn
🔶 𝐔𝐂𝐈 𝐌𝐚𝐜𝐡𝐢𝐧𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐃𝐚𝐭𝐚𝐬𝐞𝐭 : https://lnkd.in/gd39KBVQ