- Introduction
- Importance of the Arolsen Archives & Project Impact
- Project Structure
- Project Organization
- Technologies Used
- Setup Instructions
- Usage
- Analysis Steps
- Conclusion
- Collaborators
This project identifies patterns in imprisonment locations during the Holocaust. The analysis is performed using Python and Jupyter Notebooks.
Visit our platform in the following link.
The Arolsen Archives hold the world’s most comprehensive collection of documents related to Nazi persecution, serving as a critical resource for Holocaust research. These archives contain millions of records that help families trace lost relatives and provide historians with vital evidence of past atrocities.
Our project enhances the accessibility and interpretability of these records by transforming raw data into an interactive visual tool. By doing so, we:
- Support researchers in identifying overlooked patterns in persecution.
- Assist families in better understanding the journeys of their lost relatives.
- Contribute to Holocaust education, ensuring that history is preserved and remains relevant to future generations.
This project stands as a testament to the power of data science in historical research, bridging the gap between fragmented historical records and modern analytical tools.
data/: Directory containing the input data files (Not Publicly Available).webmap/: Directory containing the web application for visualizing the data on an interactive map.prepare_data_for_powerbi.ipynb: Jupyter Notebook for preparing data for Power BI visualization.geolocation_patterns.ipynb: Jupyter Notebook for analyzing geolocation patterns.geolocation_map_prediction.ipynb: Jupyter Notebook for predicting geolocation on the map.Data Cleaning+OneHotEncoding, Markov Chain.ipynb: Jupyter Notebook for data cleaning, one-hot encoding, and Markov Chain analysis.
├── data <- Directory containing the input data files
├── frontend <- Vite+React Application for our Frontend
├── webmap <- Directory containing the web application for visualizing the data on an interactive map.
├── Data Cleaning+OneHotEncoding, Markov Chain.ipynb <- initial data exploration and profiling
├── geolocation_patterns.ipynb <- Notebook containing the data analysis and visualization for geolocations
├── README.md <- The top-level README for developers using this project.
├── requirements.txt <- Python requirements
├── LICENSE <- MIT License
└── Accenture_Arolsen_Handout.pdf <- Challenge PDF
- Python
- Pandas (Dataframe)
- Scikit-learn
- Surprise (for collaborative filtering)
- Matplotlib (for data visualization)
- Seaborn (for data visualization)
- Squarify (for data visualization-Tree Map)
- ydata_profiling (for data analysis)
- Axios: A promise-based HTTP client for the browser and Node.js.
- Bootstrap: A popular CSS framework for developing responsive and mobile-first websites.
- Font Awesome: A toolkit for vector icons and social logos.
- ShinyApp: A web framework for developing web applications, originally in R and since 2022 in python.
- Google Maps API: Used for showing the map on the web data visualization
- Folium: Used to show marks and heatmap on the Map
- Microsoft Power BI: Is an interactive data visualization software product, used to create interactive data charts
- Dora Web Builder: Used to create the presentation website
-
Clone the repository:
git clone https://github.com/denishotii/Data4Good25.git cd Data4Good25 -
Create a virtual environment:
python3 -m venv .venv source .venv/bin/activate -
Install the required packages:
pip install -r requirements.txt
-
Install additional packages used in the notebooks:
pip install numpy matplotlib pandas folium ydata_profiling scikit-learn googlemaps shiny
-
Run the Jupyter Notebook server:
jupyter notebook
-
Open
main.ipynbandAnalyseDataset.ipynbin the Jupyter Notebook interface. -
Follow the steps in the notebooks to perform the data analysis and visualization.
-
Navigate to the
webmapdirectory:cd webmap -
Run the web application:
python app.py
-
Open your web browser and go to
http://127.0.0.1:8000to view the interactive map.
-
Run the prediction model:
Open and run the notebook: geolocation_map_prediction.ipynb
-
The model will predict possible routes of persecution by identifying patterns from similar records.
To achieve the goal of mapping and analyzing the movement of Holocaust victims based on tracing card data, we implemented the following steps:
- Standardized name variations and handled missing values.
- Extracted and structured relevant columns, including birthplace, nationality, geolocation, and validation score.
- Implemented similarity measures to merge duplicate records and ensure data consistency.
- Utilized the Geo Location field to map known locations of individuals.
- Developed an interactive map that visualizes movement patterns and key transit points.
- Enabled search functionality to filter individuals by name, date, and location.
- Applied clustering techniques and similarity measures to group victims with similar routes.
- Used machine learning models to predict possible missing transit locations based on known paths.
- Designed an algorithm to infer unlisted stops based on historically documented patterns of deportation.
- Integrated OCR validation to assess the trustworthiness of extracted data.
- Applied manual and automated review processes to enhance data accuracy.
- Ensured automated verification flags help prioritize records needing manual validation.
This project successfully provides a historical geographic analysis tool that reconstructs the journeys of Holocaust victims. By combining data visualization, predictive modeling, and validation techniques, we help uncover patterns in forced migration, filling gaps in historical records.
Mapped and analyzed known victim transport routes using structured data.
Inferred missing transit points, improving the completeness of historical records.
Supported educational and research efforts by offering an interactive and data-driven understanding of victim movements.
This initiative serves as a step forward in digital humanities, demonstrating how data science can be used for historical preservation and awareness.