The main objective of this extensive project is to explore rental vehicle data analytics in order to identify important information about customer behavior, preferred rental locations, and the level of popularity of particular car models. The fundamental goal is to arm vehicle rental companies with the knowledge they need to improve customer experience, streamline operations, pricing policies, and focused advertising campaigns. The main goal of this analysis is to identify the critical indications that impact rental decisions by identifying patterns that lie behind the wide range of client preferences. The analysis of rental locations is a crucial component that illuminates the specific regional characteristics that are in demand. Rental companies can improve their offers by identifying the agencies and models that are most popular with clients. This allows them to potentially increase or diversify their fleets in order to better meet market demands.
We have launched and AWS EC2 RHEL 9 instance to install and configure required tools for the project.
ssh -i .ssh/"kafka.pem" ec2-user@ec2-44-202-119-172.compute-1.amazonaws.com
-
Installation
wget https://downloads.apache.org/kafka/3.6.0/kafka_2.12-3.6.0.tgz
Unzip the file and move the kafka to
/opt/kafka
tar xzf kafka_2.12-3.6.0.tgz mv kafka_2.12-3.6.0 /opt/kafka
-
configure kafka properties and copy systemd file to start the service
-
enable kafka and start service
systemctl enable kafka
systemctl start kafka
- enable and start the zookeeper service
systemctl enable zookeeper
systemctl start zookeeper
check the status of both kafka and zookeeper
systemctl status kafka
systemctl status zookeeper
-
Install docker
dnf install docker
-
check the docker service
systemctl status podman
Search for docker postgres image
docker search postgres
Pull the docker postgres image
docker pull docker.io/library/postgres
verify the image
docker images
Run the docker container to start the postgres
docker run --name some-postgres -e POSTGRES_PASSWORD_FILE=/run/secrets/postgres-passwd -d postgres
Verify the postgres pgadmin
http://44.202.119.172/pgadmin4/browser/
-
change the user to kafka user in ec2 instance to configure ETL scripts
sudo su - kafka
-
clone from github
git clone https://github.com/AkithaPinisetti2107/DCSC_Final_Project.git
-
cd to DCSC project directory
-
run the scrits to test using pyhon3
python3 DCSC_kafka_data_producer.py /dev/null
Here using
/dev/null
to re-direct console output. -
Validate the kafka producer message throuch kafka consumer command.
/opt/kafka/bin/kafka-console-consumer.sh --topic dcsc --bootstrap-server localhost:9092 --from-beginning
-
once you confirm messages flowing through kafka, run the postgres python script to load the kafka json messages into postgresql database
python3 DCSC_kafka_psql.py
-
validate the data in postgresql database either using
pgadmin4
UI or simply runningpsql
commands from ec2-instance.
once the data is confirmed we can schedule the job using crontab
to run the ETL scripts periodically.
crontab -e # this command will open crontab entries for kafka user
# produce messages to kafka every hour
0 0 */1 * * /home/kafka/DCSC_Final_Project/python3 DCSC_kafka_data_producer.py /dev/null
# consume kafka message and load to postgresql every 10th minute of hour
0 0 */10 * * /home/kafka/DCSC_Final_Project/python3 DCSC_kafka_psql.py /dev/null
Validate crontab
crontab -l
Integrate the postgresql into Tableau for visualization.
![image](https://private-user-images.githubusercontent.com/152043128/292009993-0879e9a8-d545-4377-a935-dce7c6e9ee8d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIzNDUwODgsIm5iZiI6MTcyMjM0NDc4OCwicGF0aCI6Ii8xNTIwNDMxMjgvMjkyMDA5OTkzLTA4NzllOWE4LWQ1NDUtNDM3Ny1hOTM1LWRjZTdjNmU5ZWU4ZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDczMFQxMzA2MjhaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT05MTBkZGEzZGQ5YWNmOGQ0NTBiYjk2N2Q1MzViMWY3Y2Q0N2M3Y2JlMjdhYTg0MDcwZmQyNWFjMjhjNWE1YTRjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.7Q0x1DiaWhpBEQmsztXvi2hZV0GhXuK-H1U3utdoj3E)