# Vietnam War : what the data tell us ? 📊

**Data Exploration and Visualization With Python**

![](https://static01.nyt.com/images/2017/09/17/insider/15vietnamreadsmain/merlin-to-scoop-126232361-248774-master768.jpg)

## Table of Contents

* [1.  Importing dataset and libraries](#1.-Importing-dataset-and-libraries)
    * [1.1 Importing libraries](#1.1-Importing-libraries)
    * [1.2 Importing datasets](#1.2-Importing-datasets)
    * [1.3 Summarize datasets](#1.3-Summarize-datasets)
*  [2. Data Visualization](#2.-Data-Visualization)
    * [2.1 Which country have the most missions during the war ?](#2.1-Which-country-have-the-most-missions-during-the-war-?)
    * [2.2 Where are located the missions ?](#2.2-Where-are-located-the-missions ?)
    
## 1. Importing dataset and libraries

### 1.1 Importing libraries

In [4]:
# Loading libraries 
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
%matplotlib inline

### 1.2 Importing datasets

In [5]:
bombing_operation = pd.read_csv("../input/THOR_Vietnam_Bombing_Operations.csv")
aircraft_glossary = pd.read_csv("../input/THOR_Vietnam_Aircraft_Glossary.csv", encoding = "ISO-8859-1")
weapons_glossary = pd.read_csv("../input/THOR_Vietnam_Weapons_Glossary.csv", encoding = "ISO-8859-1")

### 1.3 Summarize datasets

In [6]:
print("----- SHAPE OF DATASET -----")
print("BOMBING OPERATION : ",bombing_operation.shape)
print("AIRCRAFT GLOSSARY : ",aircraft_glossary.shape)
print("WEAPONS GLOSSARY : ",weapons_glossary.shape)

In [127]:
bombing_operation.head(2)

In [7]:
aircraft_glossary.head(2)

In [8]:
weapons_glossary.head(2)

## 2. Data Visualization

### 2.1 Which country have the most missions during the war ?



**NOTE : **we will just use the *COUNTRYFLYINGMISSION* column because we have the country and count them show us the result we are looking for

In [61]:
countries_mission = bombing_operation["COUNTRYFLYINGMISSION"]
count = countries_mission.value_counts()
countries, y = count.keys().tolist(), count.values

plt.figure(figsize=(15,10))
ax= sns.barplot(x=countries, y=y,palette = sns.cubehelix_palette(len(countries)))
plt.xlabel('Countries')
plt.ylabel('Number of mission')
plt.title('Number of mission by country')

### 2.2 Where are located the missions ?

First we isolate the 3 columns that indiquate which country engaged the mission, the latitude and longitude. After that we remove NaN rows.

In [101]:
mission_lat_lon = bombing_operation[['COUNTRYFLYINGMISSION', 'TGTLATDD_DDD_WGS84', 'TGTLONDDD_DDD_WGS84']]
mission_lat_lon = mission_lat_lon.rename(columns={"COUNTRYFLYINGMISSION": "country", "TGTLATDD_DDD_WGS84": "latitude", "TGTLONDDD_DDD_WGS84" : "longitude"})
mission_lat_lon = mission_lat_lon[pd.notnull(mission_lat_lon['latitude'])]
mission_lat_lon = mission_lat_lon[pd.notnull(mission_lat_lon['country'])]
print(mission_lat_lon.head())

To optimize the time of loading the map we round our latitude and longitude and drop the duplicates rows.

In [102]:
mission_lat_lon['latitude'] = mission_lat_lon['latitude'].round(2)
mission_lat_lon['longitude'] = mission_lat_lon['longitude'].round(2)
print("BEFORE DROP DUPLICATES : ",mission_lat_lon.shape)
mission_lat_lon = mission_lat_lon.drop_duplicates()
print("AFTER DROP DUPLICATES : ",mission_lat_lon.shape)

Now we create a color dictionnary, red for UNITED STATES and blue for other country.

In [113]:
col = {}
for c in countries:
    if c == 'UNITED STATES OF AMERICA':
        col[c] = 'red'
    else:
        col[c] = 'blue'
    
print(col)
mission_lat_lon['colors'] = [col[c] for c in mission_lat_lon['country'].values]
print(mission_lat_lon.head())

And we plot the map ! 👌 

In [None]:
fig = plt.figure(figsize=(15, 10))
m = Basemap(projection='lcc', resolution='h',
            width=5E6, height=5E6, 
            lat_0=16, lon_0=100)
m.etopo(scale=0.5, alpha=0.5)
m.drawcountries()
m.drawcoastlines()

lon = mission_lat_lon['longitude'].values
lat = mission_lat_lon['latitude'].values
col = mission_lat_lon['colors'].values

#Map (long, lat) to (x, y) for plotting
lons, lats = m(lon, lat)
# plot points as red dots
m.scatter(lons, lats, marker = 'o', color=col, s=1)
plt.show()

I'm a beginner so if you have any remark to make I will be glad to know !  👍


**TO CONTINUE ...**