# Build Docker Images

1. We should have **Docker Desktop** application installed in your system.

- start the docker in desktop

2. We should clone the code in any folder and open in vscode

3. check the `docker --version`

4. check the `docker images`

We have to create a **Docker image** along with **Airflow**, dockerize the source code in docker along with airflow, airflow doesn't work directly in *Windows*. it works well in linux environment and easy to setup, thats why will use *airflow* in docker.

benifite of using docker is it isolates your underlying infrastructure with your application dependencies.

What are your application dependencies ?

- linux environment
- python environment
- requirement.txt file
- Apache Airflow

docker will Isolate this infrastructure from outside environment that is the benifite of having docker.

So we have to create a **Dockerfile**

```Dockerfile
FROM python:3.8
USER root
RUN mkdir /app 
COPY . /app/
WORKDIR /app/
RUN pip3 install -r requirements.txt 
ENV AIRFLOW_HOME = "/app/airflow"
ENV AIRFLOW__CORE__DAGBAG__IMPORT_TIMEOUT = 1000
ENV AIRFLOW__CORE__ENABLE_XCOM_PICKING = True
RUN airflow db init
RUN airflow users create -e rahulshelke3399@gmail.com -f Rahul -l Shelke -p admin -r Admin -u admin

RUN chmod 777 start.sh
RUN apt update -y && apt install awscli -y
ENTRYPOINT ["/bin/sh"]
CMD ["start.sh"]
```

- `FROM python:3.8` : we will be using *python:3.8* as a base image.
- `USER root` : accessing system as root user
- `RUN mkdir /app` : creating a `app` directory in current directory
- `COPY . /app/` : copy the `.` current directory files to `/app` folder
- `WORKDIR /app/` : setting the working directory to `/app/`
- `RUN pip3 install -r requirements.txt` : install requirement files, file will also install the `apache-airflow`
    - if we have to use `apache-airflow` then we have to set `AIRFLOW_HOME` directory

- `ENV AIRFLOW_HOME = "/app/airflow"` : will create `airflow` folder in `app` folder there will keep our airflow.

- `ENV AIRFLOW__CORE__DAGBAG__IMPORT_TIMEOUT = 1000` : how much time we want it to wait, cause some times code takes time to execute. (almost 17-18 min)

- `ENV AIRFLOW__CORE__ENABLE_XCOM_PICKING = True` : just consider that we have to pass some information from one component to another component, so there is possibility that we will have some object that will has to be serialized , for that we have to set this environment variable to `True`.

    - In Airflow, XComs allow tasks to exchange small amounts of data. For example, Task A can push a value, and Task B can pull that value and use it.
    - You can pass complex Python objects between tasks (e.g., NumPy arrays, Pandas DataFrames, or custom classes).
    - The data is serialized using pickle, Python’s object serialization module.

- `RUN airflow db init` : airflow is a schedular , every schedular requires a database to keep information regarding users, to manage access heirarchy and to keep track of job running, which job ran when. and other related configuration. (by default it uses `SQLlite`, but we can use any)

- `RUN airflow users create -e <user-email> -f <first-name> -l <last-name> -p <password> -r <role> -u <user-name>` : we are hard coding the variables , if we need to log in to airflow will require <password> and <user-name>.

- `RUN chmod 777 start.sh` : change the permissions of the file start.sh so that anyone (owner, group, others) can **read, write,** and **execute** it.

- `RUN apt update -y && apt install awscli -y` : we also going to use *AWS S3* buckt because of that we are installing aswcli

- `ENTRYPOINT ["/bin/sh"]` : this a entry point to run bash script
 
- `CMD ["start.sh"]` : this is a bash script to run


By default, Airflow stores this data in the metadata database using JSON serialization.
Airflow has two things to start airflow we have two things
1. Dahsboard
2. Airflow Scheduler

to start this will write one bash script

*start.sh* : it will going to contain some code, which can start our **Airflow Schedular** and **Airflow Web-Server** 

```bash
#!bin/sh

nohup airflow scheduler &
airflow webserver
```

select end of line sequence $\longrightarrow$ **LR**, only then this bash script will execute.

create docker image in terminal using following command

`docker build -t <image-name>:<tag> .`

docker image build will start