# 🚀 Setting Up Apache Beam in Google Colab

Now that we are done with the theory, let’s **move to practicals**.  
Before we write our first Beam pipeline, we need a proper setup.  

---

## 1️⃣ Why Colab for Apache Beam?

- Installing Beam **locally** can be tricky:  
  - Requires Python setup  
  - Managing versions & dependencies  
  - Paths & environment issues  

- To avoid these issues, we will use **Google Colab**.  

👉 **What is Colab?**  
- Online Jupyter-like environment provided by Google.  
- Runs Python code on a **Google Virtual Machine**.  
- Provides **RAM + Disk** for execution.  
- No installation needed, just open in browser.  
- Supports saving notebooks in **Google Drive**.  

💡 Think of Colab as an **online Jupyter Notebook** where setup is already handled.  

---

## 2️⃣ Installing Apache Beam in Colab

In Colab, to run **shell commands** we prefix them with `!`.  
For example, to install Beam:

```python
!pip install --quiet apache-beam


In [1]:
!pip install apache-beam

Collecting apache-beam
  Downloading apache_beam-2.68.0-cp311-cp311-win_amd64.whl.metadata (20 kB)
Collecting crcmod<2.0,>=1.7 (from apache-beam)
  Downloading crcmod-1.7.tar.gz (89 kB)
     ---------------------------------------- 0.0/89.7 kB ? eta -:--:--
     ------------------ --------------------- 41.0/89.7 kB 1.9 MB/s eta 0:00:01
     ---------------------------------------- 89.7/89.7 kB 1.7 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting orjson<4,>=3.9.7 (from apache-beam)
  Downloading orjson-3.11.3-cp311-cp311-win_amd64.whl.metadata (43 kB)
     ---------------------------------------- 0.0/43.0 kB ? eta -:--:--
     ---------------------------------------- 43.0/43.0 kB ? e

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mysql-connector-python 8.0.33 requires protobuf<=3.20.3,>=3.11.0, but you have protobuf 4.25.8 which is incompatible.
pylint 2.17.4 requires dill>=0.3.6; python_version >= "3.11", but you have dill 0.3.1.1 which is incompatible.

[notice] A new release of pip is available: 24.0 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [4]:
!mkdir -p data

A subdirectory or file -p already exists.
Error occurred while processing: -p.
A subdirectory or file data already exists.
Error occurred while processing: data.


In [16]:
!mkdir data


A subdirectory or file data already exists.


In [17]:
!dir

 Volume in drive C is Windows
 Volume Serial Number is 96E0-2891

 Directory of c:\Users\bhask\AppData\Local\Programs\Microsoft VS Code

26-09-2025  18:12    <DIR>          .
01-09-2025  16:26    <DIR>          ..
26-09-2025  18:12    <DIR>          -p
26-09-2025  10:36    <DIR>          appx
23-09-2025  12:35    <DIR>          bin
18-09-2025  00:07           167,282 chrome_100_percent.pak
18-09-2025  00:07           258,304 chrome_200_percent.pak
18-09-2025  00:11       197,459,000 Code.exe
18-09-2025  00:07               367 Code.VisualElementsManifest.xml
18-09-2025  00:10         4,927,032 d3dcompiler_47.dll
26-09-2025  18:12    <DIR>          data
18-09-2025  00:11         2,882,080 ffmpeg.dll
18-09-2025  00:07        10,467,680 icudtl.dat
18-09-2025  00:09           515,616 libEGL.dll
18-09-2025  00:10         8,061,472 libGLESv2.dll
18-09-2025  00:07        15,091,641 LICENSES.chromium.html
23-09-2025  12:35    <DIR>          locales
23-09-2025  12:35    <DIR>          policies
