End-to-end data analytics project using Python, SQL Server, and Tableau for ETL, storage, and visualization.
This project demonstrates an end-to-end data analytics workflow — from data extraction and transformation using Python, to storage and querying with SQL Server, and finally data visualization using Tableau. It aims to generate meaningful business insights through automation and reporting.
Data-pipeline-python-sql-tableau/
│
├── data/ # Raw or sample datasets
├── python/ # Python scripts for ETL
├── sql/ # SQL scripts (DDL/DML queries)
├── tableau/ # Tableau workbooks (.twb/.twbx files or screenshots)
├── README.md # Project documentation
└── requirements.txt # Python dependencies
• Python (Pandas, SQLAlchemy/pyodbc) – for data extraction, transformation (ETL)
• SQL Server – for data storage and querying
• Tableau – for building interactive dashboards
• Jupyter Notebook – for EDA and testing
• Automated ETL using Python
• Cleaned and normalized data stored in SQL Server
• Interactive dashboards in Tableau
• Real-world dataset (customizable)
• Modular codebase and reusable scripts
git clone https://github.com/Mahpara810/Data-pipeline-python-sql-tableau
cd Data-pipeline-python-sql-tableau
pip install -r requirements.txt
The dataset used in this project is publicly available on Kaggle:
• Open the Jupyter Notebook:
notebooks/Telecom-customer-churn.ipynb
• This notebook performs:
- Data loading and cleaning
- Transformation into a star schema (fact and dimension tables)
- Uploading the transformed data into the existing tables in the SQL Server database.
• Ensure your SQL Server connection details (host, database, username, password) are correctly configured in the notebook.
• After connecting, verify that data of the fact and dimension tables have been successfully loaded into SQL Server.
• Create a new database.
• Run the SQL scripts in the sql/ folder to create the required tables.
• Make a connection to SQL Server from the Jupyter notebook using pyodbc or sqlalchemy.
• Push the cleaned and transformed data from the notebook into the SQL Server tables using pandas.to_sql() or SQL queries.
• Open the .twb or .twbx file from the tableau/ folder using Tableau.
• Connect to your SQL Server database, or use the extracted data embedded in the workbook.
• Explore and analyze the visualized data insights to uncover trends, patterns, and key metrics.
• How many customers have churned, and what is the overall churn rate?
• Which customer groups (by gender, internet service type, and contract type) are most likely to churn?
• How does the type of internet service (DSL, Fiber Optic, or None) affect customer churn rates?
• Are customers on month-to-month contracts more likely to leave compared to one- or two-year contracts?
• Which payment methods are most commonly used, and do they influence customer retention?
• How much revenue is generated at different stages of a customer's tenure?
• Who are the top 10 highest-spending customers based on average monthly charges?
• Which type of internet service is the most popular among customers?
• What is the average customer tenure, and how does it relate to loyalty and churn?
• How many senior citizens live alone, and could this impact their likelihood of churning?
This template can be adapted for:
• Sales Performance Analysis
• Customer Segmentation
• Financial Reporting
• Healthcare Data Analysis
• Marketing Campaign Insights
Telco Customer Churn Dataset This dataset, provided by IBM Sample Data Sets, includes information about a telecom company's customers, such as services signed up for, customer account information, and whether or not the customer churned. It's commonly used for customer retention and churn prediction modeling.
• Pandas Documentation
• NumPy Documentation
• Matplotlib Documentation
• Seaborn Documentation
• Scikit-learn Documentation
• IBM Developer – Predict customer churn with SciKit-Learn
• Medium Article – Customer Churn Prediction with Telco Dataset
• Thanks to IBM Sample Data Sets for providing the Telco Customer Churn dataset.
• Special thanks to Kaggle for hosting and making the dataset easily accessible to the data science community.
• Gratitude to the open-source community behind tools like Pandas, NumPy, Matplotlib and Seaborn.
• Appreciation to the authors of various tutorials and blog posts that helped guide the data exploration and model building process.
Contributions are welcome! Please fork the repo, make your changes, and submit a pull request.
This project is licensed under the MIT License.