AIDM-7410 Datasets-and-Codes

This repository was created in the Semester 2 of 2022-2023. It stores the group project datasets&codes of the course AIDM-7410 Computational Journalism @ Hong Kong Baptist University, School of Communication (HKBU). This group project was created by JIANG Zhuohao, YU Minghao, and HE Yuchen.

We have utilized AI techniques/models and visualization components to create a data-driven news work focused on the World Press Photo (WPP) contest.

Specifically, we have collected and analyzed 2,880 entries and over 12,800 photos from all WPP awards since 1955, along with examination on descriptions of these award-winning photos and information about the photographers published on the official WPP website.

Our goal was to identify trends and characteristics of WPP awards both externally and internally, strive to demonstrate the overall criteria and taste of WPP awards (and their trends), and ultimately tend to acquire the answer and a guidance for you who are interested:
WHAT MAKES A PRIZE IN World Press Photo CONTESTS?

The following is the content of web-scraping, AI analysis & visualisation details.

Directory Structure Description

├── ReadMe.md           // help document

├── output    //  database files that need to be used 

    │   ├── dfimage    // Every photo's information with local storage path by year

│   ├── dfimageurl    // Every photo's information with image url by year

│   ├── database_photographer_all.csv   // Every photographer's official information stored in WPP's website combined

│   ├── dfimageurlall.csv   // Every photo's information with image url combined

│   ├── dfimageurlall_summerized.csv   // Every photo's information with shorten description

│   ├── dfimageurlall_summerized_location.csv   // Every photo's information with shorten description and location

│   ├── timeline   // Awards per country over time

│   ├── new_df_country(3).csv  // Database for earth model establishment after manual correction of locations

│   ├── photographer   // Top 10,Top 20 photographer 

│   ├── prizegroup  // Distribution of awards for protest-related photos

    │   ├── dfphotographer.csv    // Every works' direct link of presentation page




├── code           // The core code part, including the basic implementation process of each part


│   ├── ScrapeURL.ipynb    // Scrape all stored photos' information and photo url

    │   ├── ImageDownload.ipynb    // Download every photos to local for AI analysis

│   ├── Summerization.ipynb    // Use pre-trained Flan-T5 model to shorten the description of photo stories

    │   ├── Location-detection.ipynb    // Use UIE base model in cooperation with Spacy and PyCountry library to extract the exact location of each photo where it was taken

│   ├── Combine picture.ipynb    // Stitch pictures

│   ├── country net.ipynb    // Extraction of the country where the photo was taken, analysis of the photographer's footprint

│   ├── photographer.ipynb    // Recap of awards for all countries

Final integration + computational journalism content presentation:

What make a Prize? Presentation Webpage

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
code		code
output		output
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

output

output

README.md

README.md

Repository files navigation

AIDM-7410 Datasets-and-Codes

Directory Structure Description

About

Releases

Packages

Languages

antiwarp2000/AIDM-7410-Datasets-and-Codes

Folders and files

Latest commit

History

Repository files navigation

AIDM-7410 Datasets-and-Codes

Directory Structure Description

About

Topics

Resources

Stars

Watchers

Forks

Languages