AI and Defense Strategy: Text Analysis

This python project analyzes national AI and defense strategy documents using zero-shot text classification. The project focuses on Southeast Asia and nearby countries, specifically: Australia, Indonesia, Malaysia, Singapore, Thailand, and Vietnam.

Getting Started

python -m main.py

Usage

import os
from textanalysis import analysis

path = os.path.join(os.getcwd(), 'data', 'policies', 'australia_defense.pdf')
temp = analysis.extract_pdfs(path)
df, fig = analysis.analyze_corpus(temp)

The result of analyze_corpus is a dataframe of classified text (by topic and sentiment) and an interactive plot of the topic and sentiment by text chunk.

Algorithm Details

This code uses the facebook/bart-large-mnli large BART model from Hugging Face. This is a MutliNLI-tuned model based on BART and used here for zero-shot text classification.

This code also uses the distilbert-base-uncased-finetuned-sst-2-english model from Hugging Face. This is a fine-tuned model based on DistilBERT and used here for sentiment classification.

distilbert-base-uncased-finetuned-sst-2-english has strong evaluation results in terms of accuracy and precision:

However, it is also subject to risks, limitations, and biases.

Data

The national-level AI strategies or policies for GPAI and each country under consideration are included as .pdfs in the data/policies directory. The text-only version of those policies are included as .txts in the data/texts directory.

The membership assessment metrics for the Global Partnership on Artificial Intelligence (GPAI) are included in the data/metrics directory. This directory includes the source documents and consolidated metrics for the countries under consideration. The metrics are defined in the 2021 GPAI Frame for letter of intent and reference metrics to support the assessment of GPAI Membership (also available in the same directory). The datasets are organized with the following identifiers:

Identifier	Dataset
aidv	AI and Democratic Values Index
aigs	AI Global Surveillance Index
aii	Stanford AI Index
cri	Commitment to Reducing Inequality Index
di	Democracy Index
gai	Global AI Index
gair	Government AI Readiness Index
gfs	Global Freedom Score
libdem	V-Dem Liberal Democracy Index
odi	Open Data Index
ttaip	Total number of 10% top-cited AI scientific publications, fractional counts (source)

Intermediate data files and output figures and tables are included in the data/output directory.

Results

Exploratory analysis suggest that the approach is feasible. The following figure shows the sentiment and topic classficiation through Singapore's National AI Strategy.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github/workflows		.github/workflows
.vscode		.vscode
data		data
test		test
textanalysis		textanalysis
.coverage		.coverage
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
coverage.svg		coverage.svg
requirements.txt		requirements.txt

License

ajkeith/StrategyDocumentAnalysis

Folders and files

Latest commit

History

Repository files navigation

AI and Defense Strategy: Text Analysis

Getting Started

Usage

Algorithm Details

Data

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Languages