# Mega Notebook (06 → 12): Full ALPR System (YOLOv8 + EasyOCR)

This notebook contains the complete pipeline for Automatic License Plate Recognition (ALPR):

1. Plate detection using YOLOv8
2. Plate cropping (ROI extraction)
3. OCR preprocessing (contrast + thresholding)
4. Text extraction using EasyOCR
5. Feature engineering (regex validation + confidence fusion)
6. End-to-end inference on images
7. Video pipeline (optional)
8. Evaluation + export

Dataset and model are loaded from Google Drive for persistence.


## Section 0 — Setup and Paths

This section installs required packages, mounts Google Drive, imports libraries, and defines all required paths.


In [2]:
!pip install -q ultralytics easyocr opencv-python pandas numpy matplotlib


In [3]:
from google.colab import drive
drive.mount("/content/drive")


Mounted at /content/drive


In [4]:
import os,glob,re,time
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from ultralytics import YOLO


Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.


In [5]:
DATASET_DIR="/content/drive/MyDrive/ALPR_DATASET/Indian-License-Plate-1"
MODEL_PATH="/content/drive/MyDrive/ALPR_MODELS/plate_yolov8n_best.pt"

OUT_DIR="/content/alpr_outputs"
CROP_DIR=os.path.join(OUT_DIR,"crops")
ANN_DIR=os.path.join(OUT_DIR,"annotated")
os.makedirs(CROP_DIR,exist_ok=True)
os.makedirs(ANN_DIR,exist_ok=True)

TEST_IMG_DIR=os.path.join(DATASET_DIR,"test","images")

print("DATASET exists:",os.path.exists(DATASET_DIR))
print("MODEL exists:",os.path.exists(MODEL_PATH))
print("TEST_IMG_DIR exists:",os.path.exists(TEST_IMG_DIR))
print("OUT_DIR:",OUT_DIR)


DATASET exists: True
MODEL exists: True
TEST_IMG_DIR exists: True
OUT_DIR: /content/alpr_outputs


## Section 6 — Plate Detection + Cropper

This section:
- Loads YOLOv8 trained weights
- Detects the best plate bounding box per image (Top-1)
- Crops the plate region with padding
- Saves plate crops for OCR processing


In [6]:
detector=YOLO(MODEL_PATH)
print("YOLO detector loaded")


YOLO detector loaded


In [7]:
def crop_plate(img,xyxy,pad=10):
    h,w=img.shape[:2]
    x1,y1,x2,y2=map(int,xyxy)
    x1=max(0,x1-pad); y1=max(0,y1-pad)
    x2=min(w,x2+pad); y2=min(h,y2+pad)
    return img[y1:y2,x1:x2]


In [8]:
test_imgs=glob.glob(os.path.join(TEST_IMG_DIR,"*.jpg"))+glob.glob(os.path.join(TEST_IMG_DIR,"*.png"))+glob.glob(os.path.join(TEST_IMG_DIR,"*.jpeg"))
print("Total test images:",len(test_imgs))


Total test images: 164


In [9]:
crop_rows=[]
saved=0

for p in test_imgs:
    img=cv2.imread(p)
    r=detector.predict(p,conf=0.05,verbose=False)[0]

    if r.boxes is None or len(r.boxes)==0:
        continue

    k=int(r.boxes.conf.argmax())
    xyxy=r.boxes.xyxy[k].cpu().numpy()

    crop=crop_plate(img,xyxy,pad=10)
    outp=os.path.join(CROP_DIR,os.path.basename(p))
    cv2.imwrite(outp,crop)

    crop_rows.append([os.path.basename(p),outp])
    saved+=1

print("Saved crops:",saved)
print("CROP_DIR:",CROP_DIR)


Saved crops: 58
CROP_DIR: /content/alpr_outputs/crops


## Section 7 — OCR Preprocessing

This section improves OCR accuracy by applying:
- Grayscale conversion
- Upscaling (2x)
- CLAHE contrast enhancement
- Gaussian denoising
- Adaptive thresholding

Output: preprocessed images ready for EasyOCR


In [10]:
def ocr_preprocess(bgr):
    g=cv2.cvtColor(bgr,cv2.COLOR_BGR2GRAY)
    g=cv2.resize(g,None,fx=2,fy=2,interpolation=cv2.INTER_CUBIC)
    clahe=cv2.createCLAHE(2.0,(8,8))
    g=clahe.apply(g)
    g=cv2.GaussianBlur(g,(3,3),0)
    th=cv2.adaptiveThreshold(g,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,31,5)
    return g,th


## Section 8 — EasyOCR Text Extraction

This section:
- Loads EasyOCR
- Reads text from cropped plate images
- Cleans the output text
- Returns OCR text + confidence score


In [11]:
import easyocr
reader=easyocr.Reader(['en'],gpu=False)
print("EasyOCR loaded")




Progress: |██████████████████████████████████████████████████| 100.0% Complete



Progress: |--------------------------------------------------| 0.0% CompleteProgress: |--------------------------------------------------| 0.1% CompleteProgress: |--------------------------------------------------| 0.1% CompleteProgress: |--------------------------------------------------| 0.2% CompleteProgress: |--------------------------------------------------| 0.2% CompleteProgress: |--------------------------------------------------| 0.3% CompleteProgress: |--------------------------------------------------| 0.4% CompleteProgress: |--------------------------------------------------| 0.4% CompleteProgress: |--------------------------------------------------| 0.5% CompleteProgress: |--------------------------------------------------| 0.5% CompleteProgress: |--------------------------------------------------| 0.6% CompleteProgress: |--------------------------------------------------| 0.6% CompleteProgress: |--------------------------------------------------| 0.7% Complet

In [12]:
def clean_plate_text(s):
    s=s.upper()
    s=re.sub(r"[^A-Z0-9]","",s)
    return s


In [13]:
def read_easyocr(img):
    out=reader.readtext(img,detail=1)
    if not out:
        return "",0.0
    out=sorted(out,key=lambda x:x[2],reverse=True)
    txt=clean_plate_text(out[0][1])
    conf=float(out[0][2])
    return txt,conf


### OCR Quick Test (First 20 crops)
This test checks if OCR is producing readable text from crops.


In [14]:
crop_files=glob.glob(os.path.join(CROP_DIR,"*.jpg"))+glob.glob(os.path.join(CROP_DIR,"*.png"))
print("Total crops:",len(crop_files))

ocr_rows=[]
for p in crop_files[:20]:
    img=cv2.imread(p)
    g,th=ocr_preprocess(img)
    txt,conf=read_easyocr(th)
    ocr_rows.append([os.path.basename(p),txt,conf])

df_ocr=pd.DataFrame(ocr_rows,columns=["crop_image","text","ocr_conf"])
df_ocr


Total crops: 58




Unnamed: 0,crop_image,text,ocr_conf
0,MH12_jpg.rf.61628c89066f23cf5ed9cbf20eb8f03c.jpg,MHUZENJ5838,0.219473
1,MP2_jpg.rf.9807237dd678b9816d990184ea8e9fcf.jpg,FAPOGCLEZ4U,0.045095
2,DL19_jpg.rf.d2b546ac8b0d1886eca4a57ac5d13123.jpg,IHCI2782,0.032406
3,6ec9d264-9eab-4027-bdcc-6f71c842ee75___3e7fd38...,FUK07BA7252,0.666549
4,WB17_jpg.rf.43fc8f3edd24b32e03f66bb89fcb0fc3.jpg,270OZAJUS34,0.18444
5,video9_110_jpg.rf.8242c9bac599c3f022505b9fcb01...,AKHOIDT1917,0.241942
6,car-wbs-KL60N5344_00000_jpeg.rf.3bcacbcc010fcf...,KL6ON5244,0.059124
7,video8_850_jpg.rf.9d1da7d6192c280a0395c2adf0e9...,MH01BT0050,0.255816
8,car-ybs-MH46AD5258_00000_png.rf.c99d640fe1fac6...,HHAGAD3258,0.351983
9,5cbd7465-ad12-4e6b-8eaf-d7056c3852f8___New-201...,FHRZEKD6ZU,0.015602


## Section 9 — Feature Engineering (Uniqueness)

This section adds unique hackathon-ready improvements:

1. Plate format validation (India regex)
2. Multi-preprocess OCR voting (raw + gray + threshold)
3. Confidence fusion:
   final_conf = 0.6 * YOLO_conf + 0.4 * OCR_conf

These features improve robustness across different real-world scenarios.


In [15]:
plate_re=re.compile(r"^[A-Z]{2}[0-9]{1,2}[A-Z]{1,2}[0-9]{4}$")

def is_valid_plate(t):
    return bool(plate_re.match(t))


In [16]:
def best_ocr(bgr):
    g,th=ocr_preprocess(bgr)

    candidates=[]
    for im in [bgr,g,th]:
        txt,conf=read_easyocr(im)
        if txt:
            candidates.append((txt,conf,is_valid_plate(txt)))

    if not candidates:
        return "",0.0,False

    candidates=sorted(candidates,key=lambda x:(x[2],x[1]),reverse=True)
    return candidates[0]


## Section 10 — Full ALPR Pipeline (Images)

Pipeline:
Image → YOLO detect → crop → OCR preprocess → EasyOCR → validation → final result

Output:
- plate text
- yolo confidence
- ocr confidence
- final confidence score
- validity (regex match)


In [17]:
def alpr_image(img_path,conf=0.05):
    img=cv2.imread(img_path)
    r=detector.predict(img_path,conf=conf,verbose=False)[0]

    if r.boxes is None or len(r.boxes)==0:
        return {"image":os.path.basename(img_path),"plate":"","yolo_conf":0.0,"ocr_conf":0.0,"valid":False,"final_conf":0.0}

    k=int(r.boxes.conf.argmax())
    yconf=float(r.boxes.conf[k])
    xyxy=r.boxes.xyxy[k].cpu().numpy()

    crop=crop_plate(img,xyxy,pad=10)
    txt,oc,ok=best_ocr(crop)

    final=0.6*yconf+0.4*oc
    return {"image":os.path.basename(img_path),"plate":txt,"yolo_conf":yconf,"ocr_conf":oc,"valid":ok,"final_conf":final}


In [18]:
results=[]
for p in test_imgs[:50]:
    results.append(alpr_image(p,conf=0.05))

df_final=pd.DataFrame(results)
df_final.head(10)




Unnamed: 0,image,plate,yolo_conf,ocr_conf,valid,final_conf
0,0c9ebe94-827d-4c74-9950-6816e70d1bab___IMG_888...,KH20BY3665,0.065987,0.276537,True,0.150207
1,206c95ff-83b8-4273-b105-6637bf9a3038___numerix...,,0.0,0.0,False,0.0
2,2c9306ab-3454-4ca0-89fd-db3f51dabcef___3e7fd38...,,0.0,0.0,False,0.0
3,49bdf0d9-4e64-41eb-9c19-eabdc4afb051___Maruti-...,HR26CH3604,0.06087,0.995272,True,0.434631
4,5cbd7465-ad12-4e6b-8eaf-d7056c3852f8___New-201...,HR2GDKO83U,0.06966,0.322626,False,0.170846
5,684f079f-6085-420c-8e4e-e1aed270fc05___Toyota-...,,0.0,0.0,False,0.0
6,6e52e238-927b-46e0-a8f1-5f77ef1ecf9f___3e7fd38...,,0.0,0.0,False,0.0
7,6ec9d264-9eab-4027-bdcc-6f71c842ee75___3e7fd38...,UK07BA7252,0.135837,0.273678,True,0.190973
8,7f768bfc-4c86-4fff-932d-c9871a82dce5___1431955...,MHO3BS7778,0.091505,0.887653,False,0.409964
9,8187c22d-5fa4-4976-9c00-80b2ab66b97d___1290890...,,0.0,0.0,False,0.0


In [19]:
final_csv=os.path.join(OUT_DIR,"final_alpr_results.csv")
df_final.to_csv(final_csv,index=False)
print("Saved:",final_csv)


Saved: /content/alpr_outputs/final_alpr_results.csv


## Section 11 — Video Pipeline (Optional)

This section runs ALPR on video with:
- frame skipping (speed)
- OCR caching (avoid repeated OCR on same plate)
- saves annotated output video
- saves CSV log

This is optional but improves hackathon demo value.


In [20]:
def alpr_video(video_path,conf=0.05,skip=5):
    cap=cv2.VideoCapture(video_path)
    out_path=os.path.join(OUT_DIR,"video_out.mp4")

    w=int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    h=int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps=cap.get(cv2.CAP_PROP_FPS)

    fourcc=cv2.VideoWriter_fourcc(*"mp4v")
    out=cv2.VideoWriter(out_path,fourcc,fps,(w,h))

    cache={}
    f=0
    logs=[]

    while True:
        ok,frame=cap.read()
        if not ok:
            break
        f+=1

        if f%skip!=0:
            out.write(frame)
            continue

        r=detector.predict(frame,conf=conf,verbose=False)[0]
        frame_out=frame

        if r.boxes is not None and len(r.boxes)>0:
            k=int(r.boxes.conf.argmax())
            xyxy=r.boxes.xyxy[k].cpu().numpy()
            yconf=float(r.boxes.conf[k])

            crop=crop_plate(frame,xyxy,pad=10)
            txt,oc,valid=best_ocr(crop)

            if txt in cache:
                txt,oc,valid=cache[txt]
            else:
                cache[txt]=(txt,oc,valid)

            logs.append([f,txt,yconf,oc,valid])
            frame_out=r.plot()

        out.write(frame_out)

    cap.release()
    out.release()

    dfv=pd.DataFrame(logs,columns=["frame","plate","yolo_conf","ocr_conf","valid"])
    dfv.to_csv(os.path.join(OUT_DIR,"video_log.csv"),index=False)

    return out_path


## Section 12 — Evaluation + Export

This section:
- prints basic evaluation summary
- exports YOLO model to ONNX for lightweight deployment (optional)


In [21]:
print("Total samples:",len(df_final))
print("Valid plates:",int(df_final["valid"].sum()))
print("Avg final confidence:",float(df_final["final_conf"].mean()))


Total samples: 50
Valid plates: 9
Avg final confidence: 0.10788017929660967


# Completed

This Mega Notebook completed:
- Plate detection
- Cropping
- OCR preprocessing
- EasyOCR extraction
- Feature engineering
- Full pipeline (image)
- Optional video pipeline
- Evaluation summary

Next steps:
- Improve detection confidence by training longer (GPU recommended)
- Add UI (Streamlit) or API (FastAPI) for deployment
