climate-challenge-week0

10 Academy: Artificial Intelligence Mastery Week 0 Document

10 Academy: Artificial Intelligence Mastery

Week 0 Challenge Document

Kickstart Your AI Mastery with African Climate Trend Analysis Date: 22 Apr - 28 Apr 2026

Challenge Overview

This week's challenge focuses on understanding, exploring, and analyzing historical climate and weather data from Ethiopia, Kenya, Sudan, Tanzania, and Nigeria. The challenge aims to evaluate candidates for the 12-week training program in Data Engineering (DE), Financial Analytics (FA), and Machine Learning Engineering (MLE). Applicants who proceed to the next level by demonstrating sufficient performance in this week's challenge will have a clear picture of the required discipline, resilience, proactivity, talent diversity, and other essential elements of the 10 Academy training. Those who cannot make it to the limited spots available will gain a clear understanding of the direction they should improve to prepare for FA, DE & MLE job positions in the future. Everyone will gain project experience to showcase in their professional profile. This week, therefore, is a win-win for everyone. We advise you to put your best effort into completing as many tasks as possible. We know that the number of tasks you are required to complete is a lot, and you will not have time to build intuition or to be comfortable with the new concepts and skills you are exposed to with this week's challenge. Please note that building a deeper understanding is not the purpose of this week's project. Moreover, you may have never done or attempted to do some of the tasks before this training. If you are confused and overwhelmed, know that it is expected. The tutors, community managers, and all other teams are there to support you as best as they can. Be proactive in asking questions, provide resources that may help others, and above all, persist!

Business Objective

EthioClimate Analytics is a data consultancy engaged by the Ethiopian Ministry of Planning and Development to support the country's preparations for hosting COP32, the United Nations Climate Change Conference, in Addis Ababa in 2027. Ethiopia's selection as host, announced at COP30 in Belém, Brazil, represents a landmark opportunity to place Africa's climate priorities at the center of the global conversation. As a Junior Data Analyst at EthioClimate Analytics, your task is to conduct an exploratory analysis of historical climate data across five African nations. Your analysis should surface key climate trends, seasonal patterns, and anomalies that will inform Ethiopia's data-driven position ahead of the summit. Your findings must be translated into clear, evidence-backed insights that highlight Africa's unique climate vulnerabilities and support the continent’s climate policy narrative. Your report should provide insight to help realize the overarching objective: positioning Ethiopia as a credible, data-informed host and amplifying Africa's voice in global climate negotiations. What do we mean actually by evidence-backed or negotiating grade insights? An analysis qualifies as negotiation-grade when it answers three layered questions: What is changing? (trend, with baseline, with uncertainty) What did it cause? (an impact stat — yields, displacement, GDP, disease burden — even if they have to pull it from a secondary source) What does it demand? (the policy/finance ask the evidence supports — adaptation finance, early warning systems, loss-and-damage) A chart that answers only (1) is EDA. A chart that answers (1)+(2) is a report. A chart that answers (1)+(2)+(3) is a position paper. That's the ladder.

Dataset Overview

The data for this week's challenge has been pre-collected from the NASA Prediction of Worldwide Energy Resources (NASA POWER) database, a publicly available source of satellite-derived climate and weather measurements used by researchers and governments worldwide. Data has been extracted for five African countries: Ethiopia, Kenya, Sudan, Tanzania, and Nigeria, covering the period January 2015 – March 2026. You can find the data here. Do not commit them to GitHub. Each row represents a daily observation for a representative location within each country. The structure is as follows:

Column Unit Description YEAR

Year of observation DOY

Day of year (1–365 / 366). Must be converted to a proper date during cleaning T2M °C Mean daily air temperature at 2 meters above the surface T2M_MAX °C Maximum daily temperature at 2 meters T2M_MIN °C Minimum daily temperature at 2 meters T2M_RANGE °C Daily temperature range (T2M_MAX − T2M_MIN) PRECTOTCORR mm/day Bias-corrected total daily precipitation RH2M % Relative humidity at 2 meters WS2M m/s Mean daily wind speed at 2 meters WS2M_MAX m/s Maximum daily wind speed at 2 meters PS kPa Atmospheric surface pressure QV2M g/kg Specific humidity - mass of water vapor per unit mass of moist air

Week's Topics Covered

Python Programming: Task-specific programming assignments.
GitHub Commands: Continuous committing and repository management.
Data Understanding and Exploration: Applying exploratory data analysis techniques.
CI/CD: Understanding continuous integration and continuous deployment.
Streamlit: Creating a dashboard using Streamlit.

Key Dates

Challenge Introduction - 9:30 AM UTC on Wednesday, 22 Apr 2026. Interim Submission - 8:00 PM UTC on Sunday, 26 Apr 2026. Final Submission - 8:00 PM UTC on Tuesday, 28 Apr 2026.

Communication & Support

Slack channel: #all-week0 Office hours: Mon–Fri, 08:00–15:00 UTC

Instructions

Task 1: Git & Environment Setup

Objective: Get everyone comfortable with version control before touching data. Initialize Repository Create a new GitHub repo named climate-challenge-week0. Clone it locally and set up a Python virtual environment (venv or conda). Branching & Commits Create a branch called setup-task. Commit at least 3 times with messages following Conventional Commits conventions. For example: "init: add .gitignore" "chore: venv setup" "ci: add GitHub Actions workflow" Your branch must include: .gitignore (include data/ and any .csv / .ipynb_checkpoints/) requirements.txt GitHub Actions workflow (.github/workflows/ci.yml) that runs pip install -r requirements.txt Basic CI Add a GitHub Actions workflow file (.github/workflows/ci.yml) that runs python --version or pip install -r requirements.txt on every push to the main branch.

README

In README.md, document how to reproduce the environment. Merge setup-task into main via a Pull Request.

Suggested folder structure:

├── .vscode/ │ └── settings.json ├── .github/ │ └── workflows/ │ └── unittests.yml ├── .gitignore ├── requirements.txt ├── README.md ├── src/ ├── notebooks/ │ ├── init.py │ └── README.md ├── tests/ │ └── init.py └── scripts/ ├── init.py └── README.md

Key Performance Indicators (KPIs):

Dev environment set up and documented in README.md. CI workflow passes on push to main. At least 3 meaningful commits on the setup-task branch, merged via Pull Request.

Task 2: Data Profiling, Cleaning & EDA

Objective: Profile, clean, and conduct a focused exploratory data analysis on the climate dataset to extract meaningful insights about African climate trends in the lead-up to COP32.

Create branch: eda- (e.g. eda-ethiopia)

Notebook: _eda.ipynb

Perform the following on each country's dataset separately:

Data Loading & Date Parsing

Load the CSV using pd.read_csv(".csv") Add a Country column with the country name. Convert the YEAR and DOY columns into a proper datetime column: pd.to_datetime(df["YEAR"] * 1000 + df["DOY"], format="%Y%j"). Extract Month as a separate column. You will need it for seasonal analysis.

Summary Statistics & Missing-Value Report

Replace all occurrences of -999 with np.nan across the entire DataFrame. Do this before running any statistics; -999 is NASA's sentinel value for missing or out-of-range data. Run df.duplicated().sum() and drop any duplicate rows. Document how many were found and in which columns. Run df.describe() on all numeric columns and write a brief interpretation in a markdown cell below the output. Run df.isna().sum() and compute the percentage of missing values per column. List any column with >5% nulls and note what this might mean for the analysis. Outlier Detection & Basic Cleaning Compute Z-scores for T2M, T2M_MAX, T2M_MIN, PRECTOTCORR, RH2M, WS2M, and WS2M_MAX. Flag rows where |Z| > 3 and report their count. Decide whether to drop, cap, or retain outlier rows, document your reasoning clearly in a markdown cell. Handle remaining missing values: apply forward-fill for weather variables, or drop rows if more than 30% of a row's values are missing. Document your decision. Export the cleaned DataFrame to data/_clean.csv. Ensure data/ is in .gitignore and never commit CSVs.

Time Series Analysis

Plot monthly average T2M as a line chart over the full period (2015–2026). Annotate the warmest and coolest months. Plot monthly total PRECTOTCORR as a bar chart. Identify and annotate peak rainy season months. Comment in a markdown cell on any visible trends or anomalies. Correlation & Relationship Analysis Heatmap of correlations across all numeric columns. Scatter plots: T2M vs. RH2M; T2M_RANGE vs. WS2M. Identify and interpret the three strongest correlations in a markdown cell. Distribution Analysis Histogram of PRECTOTCORR (apply log scale if heavily skewed) and comment on the distribution shape. Bubble chart: T2M vs. RH2M, bubble size = PRECTOTCORR.

Key Performance Indicators (KPIs):

Correct handling of the NASA header, -999 sentinels, and duplicate rows, all documented. Cleaned CSV exported per country and excluded from GitHub. Time series, correlation, and distribution plots, each accompanied by a written interpretation. Proactivity to self-learn, sharing references used. Task 3: Cross-Country Comparison & Climate Vulnerability Ranking Objective: Synthesize the cleaned datasets from all five countries to identify relative climate vulnerability and produce a data-driven country ranking to inform Ethiopia's COP32 position paper.

Branch: compare-countries Notebook: compare_countries.ipynb

Load each country's cleaned CSV (data/ethiopia_clean.csv, etc.) and concatenate into a single DataFrame. Temperature Trend Comparison Plot monthly average T2M for all five countries on a single line chart (one line per country, 2015–2026). Summary table comparing mean, median, and standard deviation of T2M across countries. Precipitation Variability Comparison Side-by-side boxplots of PRECTOTCORR for all five countries. Summary table comparing mean, median, and standard deviation of PRECTOTCORR across countries. Extreme Event Frequency For each country, count the number of days per year where T2M_MAX exceeds 35°C (extreme heat). For each country, count the number of consecutive dry days per year (days where PRECTOTCORR < 1 mm). Visualize both as bar charts, one per metric, colored by country. Statistical Testing (optional but recommended) Run a one-way ANOVA (or Kruskal–Wallis) on T2M values across all five countries to assess whether differences are statistically significant. Briefly note the p-values and what they suggest. Vulnerability Ranking & Key Observations Produce a summary table ranking the five countries by climate vulnerability, using the evidence gathered above (temperature trends, precipitation variability, and extreme event frequency). Write a markdown cell and report with 5 bullet points framing your findings for COP32, structured as follows: Which country is warming fastest and what does the trend suggest? Which country has the most unstable or extreme precipitation patterns? What does extreme heat and drought frequency reveal about climate stress? How does Ethiopia's climate profile compare to its neighbors? Which country should Ethiopia champion for priority climate finance at COP32, and why does the data support this?

Key Performance Indicators (KPIs):

All five countries included in every comparison plot. Summary tables present for both temperature and precipitation with mean, median, and standard deviation. Correct implementation and reporting of statistical test p-values. A clearly reasoned vulnerability ranking table supported by the data. Relevance and actionability of the 5 COP32-framed observations. Bonus (Optional): Interactive Dashboard Objective: Build a Streamlit app, code only, no data, to visualize your insights. Design and develop a dashboard using Streamlit to visualize the dataset with interactive elements. Integrate Python scripts to fetch and process data dynamically. Implement interactive features such as: A country selector (multi-select) to filter plots by country. A year range slider to zoom into specific periods. A variable selector dropdown (e.g., T2M, PRECTOTCORR, RH2M). Deploy the Streamlit dashboard to Streamlit Community Cloud.

Suggested folder structure:

├── app/ │ ├── init.py │ ├── main.py # main Streamlit application script │ └── utils.py # utility functions for data processing and visualization └── scripts/ ├── init.py └── README.md

Minimum Essential To Do:

Create branch: dashboard-dev App: app/main.py with: Country multi-select widget. Year range slider. Temperature trend line chart. Precipitation distribution boxplot. Git hygiene: Keep data/ ignored; app reads local CSVs. Commit & PR: At least 1 commit ("feat: basic Streamlit UI"). Document the development process and usage instructions in README.md.

Key Performance Indicators (KPIs):

Dashboard Usability: Intuitive navigation and clear labels. Interactive Elements: Effective use of Streamlit widgets. Visual Appeal: Clean, professional design that communicates insights clearly. Deployment Success: Fully functional deployment accessible via a public URL.

Due Date (Submission) Sunday: (26 Apr, 2026 8:00 PM) (UTC) What to Submit: GitHub link to your main branch. An interim report covering: Task 1 summary (Git & environment setup) Task 2 approach (data profiling and cleaning outline) Tuesday: (28 Apr, 2026 8:00 PM) (UTC) What to Submit: GitHub link to your main branch. A final report covering all Week 0 work, written in a Medium blog style ( PDF format). Place your dashboard screenshot in the repository (e.g., under dashboard_screenshots/). (Optional) Other Considerations: Documentation: Encourage detailed documentation in code and report writing. Collaboration: Emphasise collaboration through GitHub issues and projects. Communication: Regular check-ins, Q&A sessions, and a supportive community atmosphere. Flexibility: Acknowledge potential challenges and encourage proactive communication. Professionalism: Emphasise work ethics and professional behavior. Time Management: Stress the importance of punctuality and managing time effectively.

Tutorials Schedule

In the following, the color purple indicates morning sessions, and non-purple indicates afternoon sessions.

Day 1 (Wednesday 22 Apr 2026): Introduction to the Challenge(Kerod) Python Environment, Git & GitHub Basics + CI/CD (Mahbubah) Day 2 (Thursday 23 Apr 2026): Data Science Workflow & CRISP-DM Basics (Mahbubah) Data Quality, Profiling and Exploratory Data Analysis Techniques (Feven) Day 3 (Friday 24 Apr 2026) Dashboard development using Streamlit (Feven) Day 4 (Monday 27 Apr 2026) Q&A Day 5 (Tuesday 28 Apr 2026) Q&A Feedback You will receive comments/feedback in addition to a grade.

Appendix: Evidence-Backed Climate Analysis — Reference Reading and Framework

The business scenario in this challenge asks for "evidence-backed insights" that could inform a government position paper. That phrase has a specific meaning in climate policy work, and the distance between exploratory data analysis and negotiation-grade evidence is wider than it first appears.

This appendix points to the documents that define what negotiation-grade climate evidence looks like in the African policy context, isolates the visualization and analytical patterns they share, and maps those patterns onto the variables available in the provided dataset. Use it as a reading list and as a self-check against your own final report. Reference documents

Study these five documents before finalizing your analysis. You do not need to read them cover-to-cover. The point is to internalize their shape — how they frame claims, what they visualize, and what they deliberately omit.

WMO State of the Climate in Africa 2024 Report landing page: https://wmo.int/publication-series/state-of-climate-africa-2024 Direct PDF: https://wmo.int/sites/default/files/2025-05/Africa_2024final1.pdf Companion story map: https://wmo.int/publication-series/state-of-climate-africa This is the single most important reference in this list. Every major claim in African climate negotiations traces back to some edition of this annual report series. Pay attention to: Annual regional mean temperature anomalies for Africa, expressed against a 1991–2020 baseline, reported using six independent datasets (Berkeley Earth, ERA5, GISTEMP, HadCRUT5, JRA-3Q, NOAAGlobalTemp v6). The six-subregion decomposition of the continent: North, West, Central, East, Southern Africa, and the Indian Ocean islands. Precipitation anomaly maps using divergent color scales against a fixed climatological reference. SPI12 drought severity maps with explicit reference periods. The pattern of welding every climate statistic to an impact statistic — aggregate cereal yields in Southern Africa 16% below the five-year average in 2024, Moroccan output down 42%, Lake Kariba hydropower collapse, and so on.
WMO State of the Global Climate Update for COP30 Landing page: https://wmo.int/publication-series/state-of-climate-update-cop30 Direct PDF: https://wmo.int/sites/default/files/2025-11/State%20of%20the%20Climate%202025%20Update%20COP30%20(31%20oct).pdf This document was released at the Leaders' Summit in Belém as the authoritative scientific anchor for COP30 negotiations. Read it as an exemplar of compression — a short document that consolidates temperature, greenhouse gas concentration, ocean heat, sea ice, and extreme-event evidence into a form that fits on a negotiator's desk. The headline finding — that 2015–2025 forms the eleven warmest years in the 176-year observational record, with early 2025 at 1.42 °C ± 0.12 °C above pre-industrial — is reported with explicit uncertainty and multi-dataset backing. Note the restraint: no speculation, no policy prescription, just authoritative numbers with provenance.
World Bank Climate Risk Country Profiles Ethiopia: https://climateknowledgeportal.worldbank.org/sites/default/files/2021-05/15463A-WB_Ethiopia%20Country%20Profile-WEB.pdf Kenya: https://climateknowledgeportal.worldbank.org/sites/default/files/2021-05/15724-WB_Kenya%20Country%20Profile-WEB.pdf Interactive portal covering all five countries in the dataset, including Sudan, Nigeria, and Tanzania: https://climateknowledgeportal.worldbank.org/country/ethiopia/climate-data-historical These profiles are the most directly reproducible template for the kind of country-level analysis this challenge asks for. Each uses the same structure — historical mean climate, observed trends, extremes, projections — and publishes warming stripes, monthly climatology plots, and seasonal cycle charts that can be rebuilt almost directly from the daily variables in the provided dataset.
Power Shift Africa — COP30 Scorecard https://www.powershiftafrica.org/publications/cop30scorecard Read this as a counterpoint to the WMO reports. Where WMO produces the evidence, Power Shift Africa converts evidence into a negotiation-outcome evaluation across climate finance, adaptation, loss and damage, just transition, and trade measures. This document makes the pipeline visible: scientific report, to position evidence, to negotiation demand, to outcome audit. A country-level analysis like the one this challenge asks for is supposed to feed something like this downstream.
Addis Ababa Declaration and the Second Africa Climate Summit (ACS2) Analysis and context: https://issafrica.org/iss-today/can-africa-shift-from-victim-to-player-at-cop30 ACS2 was held in Addis Ababa in September 2025 as the continental pre-COP30 summit. It adopted the Addis Ababa Declaration as Africa's consolidated position going into Belém. This is the closest analogue to the kind of document the business scenario imagines your analysis would inform. Read it for the rhetorical frame — the explicit shift from Africa-as-victim to Africa-as-strategic-player — that any evidence package needs to support. Common visualization vocabulary

Strip the five documents down and a narrow, repeated set of visual patterns emerges. These are the conventions your final charts should follow.

Anomaly time series against a fixed baseline (1991–2020), not raw values. Warming stripes or line plots with multi-dataset envelopes. Divergent-color anomaly maps for rainfall and temperature against climatology — a single year framed against a 30-year normal. Subregional aggregations (East Africa, Sahel, Horn) used in preference to national means when the climate signal is regional rather than national. Index-based drought and wetness maps — SPI, SPEI — that collapse noisy daily rainfall into categorical, policy-legible bands. Extreme-event inventories — tables of date, location, duration, and documented impact — that convert statistical distributions into concrete narratives. The impact weld: every climate number is paired with an agriculture, water, GDP, displacement, or health statistic. Without this weld, the chart is exploratory. With it, the chart is argument. Note equally what does not appear in these documents: scatter plots of raw variables, correlation heat maps, and long tables of summary statistics. Those are intermediate analytical outputs, not negotiation outputs. They belong during exploration but not in a final deliverable written for policy readers.

Mapping the provided dataset to these patterns

The NASA POWER variables in the provided dataset are the raw material for most of the chart types listed above. Suggested mappings:

T2M with YEAR and DOY — annual mean temperature anomaly versus a 1991–2020 baseline, per country. Warming stripes. This is the direct analogue of the WMO Figure 2 in the Africa report, and the single most important chart to produce. T2M_MAX — heatwave day counts (days above a local 90th percentile, or above fixed thresholds like 35 °C), as an annual series. T2M_MIN — tropical-night counts (nights above 20 °C), relevant for any human-health framing. T2M_RANGE — diurnal temperature range trend. A declining DTR is one of the cleaner fingerprints of greenhouse-gas-driven warming and is underused in descriptive climate reports. Including it signals analytical depth. PRECTOTCORR — monthly totals, then SPI-3 and SPI-12 drought indices. The 2020–2023 Horn of Africa drought and the 2024 East African long-rains flooding both live in this variable for Kenya, Ethiopia, and Tanzania. PRECTOTCORR (seasonal structure) — rainy-season onset and cessation dates per year, defined by first and last dates of sustained rainfall above a locally appropriate threshold. Shifts in these dates are the most farmer-legible form of climate evidence and translate directly into agricultural planning narratives. RH2M combined with T2M_MAX — heat-index or wet-bulb-temperature proxy. Links directly to labor productivity and outdoor-work feasibility framing. QV2M — absolute humidity trends. A cleaner long-term climate signal than relative humidity because it is not confounded by temperature. WS2M_MAX — extreme-wind events, relevant for dust, Harmattan, and coastal-storm framing in Sudan and Nigeria. PS — primarily a quality-control variable. Unusual values often indicate station-elevation metadata issues rather than meaningful meteorological signal. Three-layer framework for evaluating your own figures

Before including any figure in the final report, ask whether it answers each of these three questions in sequence:

What is changing? A trend or anomaly, expressed against an explicit baseline, with some representation of uncertainty. What has it caused? An impact statistic — yields, displacement, GDP, disease burden, hydropower output — even when sourced from secondary literature. The chart must carry the impact alongside the climate signal. What does it demand? The policy or finance ask the evidence supports: adaptation finance, early warning systems, loss and damage, technology transfer, capacity building. A figure that answers only (1) is exploratory data analysis. A figure that answers (1) and (2) is a climate report. A figure that answers all three is a position paper. That ladder is the distinction between an EDA notebook and the deliverable this challenge asks for.

References

Climate Data

NASA POWER Data Access Viewer World Bank Climate Change Knowledge Portal IPCC Sixth Assessment Report - Africa Chapter COP32 Ethiopia - African Group of Negotiators

Python Programming

Object Oriented Programming - Real Python Python Courses and Tutorials (python-course.eu)

Dashboard Design and Implementation

Get started - Streamlit Docs Streamlit 101: An in-depth introduction - Towards Data Science Streamlit Community Cloud

Python Testing

A Gentle Introduction to Unit Testing in Python - Machine Learning Mastery The Hitchhiker's Guide to Python - Testing Getting Started with Testing in Python - Real Python

Version Control - Git

What is version control - Atlassian Learn Git Branching - Interactive Which files to not track - Atlassian Conventional Commits

CI/CD

What is Continuous Integration - Atlassian DevOps Pipeline - Atlassian Setting up a CI/CD Pipeline on GitHub

EDA & Data Science

What is Data Engineering - AltexSoft Exploratory Data Analysis - Towards Data Science

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

climate-challenge-week0

10 Academy: Artificial Intelligence Mastery Week 0 Document

Week 0 Challenge Document

Challenge Overview

Business Objective

Dataset Overview

Column Unit Description YEAR

Year of observation DOY

Week's Topics Covered

Key Dates

Communication & Support

Instructions

Task 1: Git & Environment Setup

README

Suggested folder structure:

Key Performance Indicators (KPIs):

Task 2: Data Profiling, Cleaning & EDA

Perform the following on each country's dataset separately:

Data Loading & Date Parsing

Summary Statistics & Missing-Value Report

Time Series Analysis

Key Performance Indicators (KPIs):

Key Performance Indicators (KPIs):

Suggested folder structure:

Minimum Essential To Do:

Key Performance Indicators (KPIs):

Tutorials Schedule

Appendix: Evidence-Backed Climate Analysis — Reference Reading and Framework

References

Climate Data

Python Programming

Dashboard Design and Implementation

Python Testing

Version Control - Git

CI/CD

EDA & Data Science

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages