<a href="https://colab.research.google.com/gist/SoKawai1/4eb36e1ac560002936d1013c0864232a/anime_facial_expressions_df.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Сбор и очистка дата сета "Anime faces classification"

Материалы:
*   [Nagadomis LBP cascade](https://github.com/nagadomi/lbpcascade_animeface)
*   [Danbooru (изображения)](https://danbooru.donmai.us/)
---
Дата: 25.01.2025

Импортирование библиотек

In [None]:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import os
from google.colab import files, drive
import pandas as pd
import cv2
import kagglehub
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix , classification_report
import seaborn as sns
import sys
import requests
from io import BytesIO
import shutil

Подключение к google drive

In [None]:
drive.mount('/content/drive')

Mounted at /content/drive




---



Загрузка lbp cascade для детекции лиц в аниме стилистике

In [None]:
!wget https://raw.githubusercontent.com/nagadomi/lbpcascade_animeface/master/lbpcascade_animeface.xml

--2025-01-04 09:07:26--  https://raw.githubusercontent.com/nagadomi/lbpcascade_animeface/master/lbpcascade_animeface.xml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 246945 (241K) [text/plain]
Saving to: ‘lbpcascade_animeface.xml’


2025-01-04 09:07:26 (47.4 MB/s) - ‘lbpcascade_animeface.xml’ saved [246945/246945]





---

Функции для выборки изображений с тэгами и функция для детекции и вырезки лиц из изображений

In [None]:
def fetch_images_with_tag(tag, page=1, limit=200):
    url = f"https://danbooru.donmai.us/posts.json?tags={tag}&limit={limit}&page={page}"
    response = requests.get(url)
    if response.status_code == 200:
        posts = response.json()
        return [post['file_url'] for post in posts if 'file_url' in post]
    else:
        print(f"Ошибка")
        return []


In [None]:
def detect_and_crop_from_tag(tag, output_dir, start_page=1, pages=1, cascade_file="lbpcascade_animeface.xml"):
    if not os.path.isfile(cascade_file):
        raise RuntimeError(f"{cascade_file}: not found")

    os.makedirs(output_dir, exist_ok=True)
    cascade = cv2.CascadeClassifier(cascade_file)

    face_counter = 0
    for page in range(start_page, start_page + pages):
        print(f"Fetching images from page {page}")
        image_urls = fetch_images_with_tag(tag, page=page)

        for url in image_urls:
            img_response = requests.get(url)
            if img_response.status_code == 200 and 'image' in img_response.headers['Content-Type']:
                try:
                    img = Image.open(BytesIO(img_response.content))
                    img = img.convert("RGB")
                    img_array = np.array(img)

                    #gray = cv2.cvtColor(img_array, cv2.COLOR_BGR2GRAY)

                    faces = cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(24, 24))
                    for (x, y, w, h) in faces:
                        face = img_array[y:y+h, x:x+w]
                        face_path = os.path.join(output_dir, f"face_{face_counter}.jpg")
                        cv2.imwrite(face_path, cv2.cvtColor(face, cv2.COLOR_RGB2BGR))
                        face_counter += 1

                except Exception as e:
                    print(f"Ошибка{e}")
            else:
                print(f"Ошибка на серваке")

Функции для выборки изображений по айдишникам и вырезки лиц с изображений

In [None]:
def download_and_crop_faces_by_ids(image_ids, output_dir="cropped_faces", cascade_file="lbpcascade_animeface.xml"):
    os.makedirs(output_dir, exist_ok=True)
    cascade = cv2.CascadeClassifier(cascade_file)

    for image_id in image_ids:
        url = f"https://danbooru.donmai.us/posts/{image_id}.json"
        response = requests.get(url)

        if response.status_code == 200:
            post_data = response.json()
            image_url = post_data.get('file_url')

            if image_url:
                print("Загрузка")
                img_response = requests.get(image_url)

                if img_response.status_code == 200:
                    img = Image.open(BytesIO(img_response.content))
                    img = np.array(img)

                    # Обработка черно-белых и цветных изображений
                    if len(img.shape) == 2:  # Если изображение уже черно-белое
                        gray = img
                    elif len(img.shape) == 3:  # Если изображение цветное
                        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
                    else:
                        continue

                    faces = cascade.detectMultiScale(
                        gray,
                        scaleFactor=1.1,
                        minNeighbors=5,
                        minSize=(24, 24)
                    )

                    face_counter = 0
                    for (x, y, w, h) in faces:
                        face = img[y:y+h, x:x+w]
                        face_path = os.path.join(output_dir, f"{image_id}_face_{face_counter}.png")
                        cv2.imwrite(face_path, face)
                        face_counter += 1
                else:
                    print("Err")
            else:
                print("Err")
        else:
            print("Err")


Парсинг изображений и вырезка лиц

In [None]:
tag = "crying tears"
start_page_num = 20
for i in range(20,30):
  start_page_num += 1
  detect_and_crop_from_tag(tag, f'crying_v{i}' ,start_page=start_page_num , pages=1)

Fetching images from page 21...
Processing URL: https://cdn.donmai.us/original/7b/29/7b29ce63914890fb3ab4dbb2647656a0.jpg
Detected 2 faces.
Saved: crying_v20/face_0.jpg
Saved: crying_v20/face_1.jpg
Processing URL: https://cdn.donmai.us/original/21/35/2135ca7b4349dfc4327d1e11b53ddc44.jpg
Detected 0 faces.
Processing URL: https://cdn.donmai.us/original/84/0b/840bb4da52ef081864eaa8b804b566fe.jpg
Detected 0 faces.
Processing URL: https://cdn.donmai.us/original/b4/76/b4764b0f781732fdf71dd44779b57d4d.png
Detected 0 faces.
Processing URL: https://cdn.donmai.us/original/20/44/20449a66ca76d2d26cb9383b77b343af.jpg
Detected 1 faces.
Saved: crying_v20/face_2.jpg
Processing URL: https://cdn.donmai.us/original/ad/eb/adeb8dc11346b080350312b9e7335846.jpg
Detected 1 faces.
Saved: crying_v20/face_3.jpg
Processing URL: https://cdn.donmai.us/original/3e/2f/3e2f8f479a14fcdcf57a9879962c17d5.jpg
Detected 0 faces.
Processing URL: https://cdn.donmai.us/original/61/dc/61dc5e55afab4b4ecee772797211243e.jpg
Detect

Форматирования в zip файл, и скачивание

In [None]:
for i in range(20,30):
  !zip -r crying_v{i}.zip crying_v{i}
for i in range(20,30):
  files.download(f'crying_v{i}.zip')

updating: crying_v20/ (stored 0%)
updating: crying_v20/face_65.jpg (deflated 1%)
updating: crying_v20/face_53.jpg (deflated 1%)
updating: crying_v20/face_2.jpg (deflated 1%)
updating: crying_v20/face_3.jpg (deflated 2%)
updating: crying_v20/face_64.jpg (deflated 2%)
updating: crying_v20/face_43.jpg (deflated 1%)
updating: crying_v20/face_23.jpg (deflated 0%)
updating: crying_v20/face_21.jpg (deflated 2%)
updating: crying_v20/face_45.jpg (deflated 2%)
updating: crying_v20/face_66.jpg (deflated 1%)
updating: crying_v20/face_19.jpg (deflated 1%)
updating: crying_v20/face_32.jpg (deflated 1%)
updating: crying_v20/face_70.jpg (deflated 1%)
updating: crying_v20/face_6.jpg (deflated 1%)
updating: crying_v20/face_18.jpg (deflated 1%)
updating: crying_v20/face_5.jpg (deflated 1%)
updating: crying_v20/face_54.jpg (deflated 1%)
updating: crying_v20/face_68.jpg (deflated 3%)
updating: crying_v20/face_42.jpg (deflated 1%)
updating: crying_v20/face_10.jpg (deflated 1%)
updating: crying_v20/face_20.j

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>



---



In [None]:
happy_img_ids = ['/content/drive/happy_image_ids.txt']

In [None]:
happy_img_ids = np.array(happy_img_ids)

In [None]:
download_and_crop_faces(happy_img_ids, 'happy_img')

Downloading image 8497341 from https://cdn.donmai.us/original/dd/0f/dd0f60cc90cab92fa5bda7d11e7cebe6.jpg
Detected 7 faces in image 8497341.
Saved face 0 as happy_manual/8497341_face_0.png
Saved face 1 as happy_manual/8497341_face_1.png
Saved face 2 as happy_manual/8497341_face_2.png
Saved face 3 as happy_manual/8497341_face_3.png
Saved face 4 as happy_manual/8497341_face_4.png
Saved face 5 as happy_manual/8497341_face_5.png
Saved face 6 as happy_manual/8497341_face_6.png
Downloading image 8484083 from https://cdn.donmai.us/original/c1/27/c1279a1f5c13b028093281f0660ed1b5.png
Detected 0 faces in image 8484083.
Downloading image 8484350 from https://cdn.donmai.us/original/38/88/3888df603c217f98077d9ba705553e20.png
Detected 0 faces in image 8484350.
Downloading image 8484291 from https://cdn.donmai.us/original/8d/c0/8dc043c0432151d1af5a1e0e671fc24a.png
Detected 1 faces in image 8484291.
Saved face 0 as happy_manual/8484291_face_0.png
Downloading image 8481557 from https://cdn.donmai.us/ori

In [None]:
!zip -r happy_auto.zip happy_auto #crying, tears, sad, happy manual,

  adding: happy_auto/ (stored 0%)
  adding: happy_auto/face_77.jpg (deflated 0%)
  adding: happy_auto/face_20.jpg (deflated 1%)
  adding: happy_auto/face_75.jpg (deflated 1%)
  adding: happy_auto/.ipynb_checkpoints/ (stored 0%)
  adding: happy_auto/face_51.jpg (deflated 1%)
  adding: happy_auto/face_1.jpg (deflated 1%)
  adding: happy_auto/face_92.jpg (deflated 1%)
  adding: happy_auto/face_48.jpg (deflated 0%)
  adding: happy_auto/face_72.jpg (deflated 1%)
  adding: happy_auto/face_99.jpg (deflated 3%)
  adding: happy_auto/face_40.jpg (deflated 0%)
  adding: happy_auto/face_56.jpg (deflated 1%)
  adding: happy_auto/face_100.jpg (deflated 2%)
  adding: happy_auto/face_32.jpg (deflated 1%)
  adding: happy_auto/face_61.jpg (deflated 0%)
  adding: happy_auto/face_47.jpg (deflated 0%)
  adding: happy_auto/face_62.jpg (deflated 1%)
  adding: happy_auto/face_42.jpg (deflated 1%)
  adding: happy_auto/face_86.jpg (deflated 1%)
  adding: happy_auto/face_13.jpg (deflated 2%)
  adding: happy_auto



---



Загрузка папок с фоторграфиями для дальнейшего упорядочивания данных

In [None]:
zip_path = "/content/smile_df.zip"
extract_path = "/content/"

with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_path)

Перемещение файлоа из вложенных папок в одну общую папку, а также их переименовывание



In [None]:
source_folder = "smile_df"
destination_folder = "smiling_expressions_df"

os.makedirs(destination_folder, exist_ok=True)

for subfolder in os.listdir(source_folder):
    subfolder_path = os.path.join(source_folder, subfolder)
    if os.path.isdir(subfolder_path):
        for filename in os.listdir(subfolder_path):
            file_path = os.path.join(subfolder_path, filename)
            if os.path.isfile(file_path):
                new_filename = f"{subfolder}_{filename}"
                destination_path = os.path.join(destination_folder, new_filename)
                shutil.move(file_path, destination_path)

Перемещено: smile_df/smile_v11/face_63.jpg -> smiling_expressions_df/smile_v11_face_63.jpg
Перемещено: smile_df/smile_v11/face_2.jpg -> smiling_expressions_df/smile_v11_face_2.jpg
Перемещено: smile_df/smile_v11/face_9.jpg -> smiling_expressions_df/smile_v11_face_9.jpg
Перемещено: smile_df/smile_v11/face_107.jpg -> smiling_expressions_df/smile_v11_face_107.jpg
Перемещено: smile_df/smile_v11/face_75.jpg -> smiling_expressions_df/smile_v11_face_75.jpg
Перемещено: smile_df/smile_v11/face_113.jpg -> smiling_expressions_df/smile_v11_face_113.jpg
Перемещено: smile_df/smile_v11/face_90.jpg -> smiling_expressions_df/smile_v11_face_90.jpg
Перемещено: smile_df/smile_v11/face_0.jpg -> smiling_expressions_df/smile_v11_face_0.jpg
Перемещено: smile_df/smile_v11/face_28.jpg -> smiling_expressions_df/smile_v11_face_28.jpg
Перемещено: smile_df/smile_v11/face_36.jpg -> smiling_expressions_df/smile_v11_face_36.jpg
Перемещено: smile_df/smile_v11/face_3.jpg -> smiling_expressions_df/smile_v11_face_3.jpg
Пер