-
Create project template by executing template.py file
-
Write the code on setup.py and pyproject.toml to import local packages.
-
Create a virtual env, activate it and install the requirements from requirements.txt
i) Create a virtual environment named 'vehicle'
python -m venv vehicleii) Activate the environment (windows - cmd)
vehicle\Scripts\activate.batiii) To deactive the environment
deactivateiii) Add required modules to requirements.txt, by doing
pip install -r requirements.txt -
Do a "pip list" on terminal to make sure you have local packages installed.
-
Include -e . in requirements.txt to install all the local packages from src folder in your "vehicle" environment.
-
Create a .env file and add the mongodb connection string.
CONNECTION_URL = <mongo-db_connection_string>
- Sign up to MongoDB Atlas and create a new project by just providing it a name then next next create.
- From "Create a cluster" screen, hit "create", Select M0 service keeping other services as default, hit "create deployment" (NOTE : We store data inside clusters)
- Setup the username and password and then create DB user.
- Go to "network access" and add ip address - "0.0.0.0/0" so that we can access it from anywhere
- Go back to project >> "Get Connection String" >> "Drivers" >> {Driver:Python, Version:3.12 or later}
copy and save the connection string with you(replace <db_password>). >> Done.
- Create folder "notebook" >> do step 7 >> create file "mongoDB_demo.ipynb" >> select kernel>python kernel>vehicle>>
- Dataset added to notebook folder
- Push your data to mongoDB database from your python notebook. We have to upload data in mongodb in key value format.
- Go to mongoDB Atlas >> Database >> browse collection >> see your data in key value format
Organisation -> Project -> Cluster -> Database -> Collection
- View Database
- View Collections
- Data Successfully uploaded to MongoDB Atlas Database
Logged the exception
- constant
- config_entity
- artifact_entity
- component
- pipeline
- app.py / demo.py
- Before working on "Data Ingestion" component >> declare variables within constants.init.py file
- Add code to configuration.mongo_db_connection.py file and define the class for mongodb connection.
- Inside "data access" folder , add code to proj1_data.py that will use monogo_db_connection.py. It will create a connection and fetches the from there.
- To connect with database, fetch data in key-val format and transform that to a pandas dataframe.
- Add code to entity.config_entity.py file till DataIngestionConfig class. NOTE : Set MONGODB_URL on command prompt using
set VAR_NAME=VALUE
To check if it is set ?
echo %MONGODB_URL%
Data Ingestion output





