![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banner_Top_06.06.18.jpg?raw=true)

# Languages: An important identity in globalizing world

Language is an *important* identity of the people. Not only it provides a medium to communicate, but also it binds the community together. Therefore, it is essential to preserve the linguistic diversity in the globalizing world. 

Let us check out various languages that once existed or are existing on earth. [UNESCO](https://en.unesco.org/) has compiled an [atlas](http://www.unesco.org/languages-atlas/index.php?hl=en&page=atlasmap) which contains various statistics about languages. [Choropleth map](https://en.wikipedia.org/wiki/Choropleth_map) is used to visualize this liguistic geographical dataset. 

In [None]:
# Comment following lines if modules are already installed
!pip install googletrans
!pip install gTTS

# Import python libraries
import pandas as pd
import plotly.graph_objects as go
from ipywidgets import interact, fixed, widgets, Layout, Button, Box, fixed, HBox, VBox
import googletrans
import gtts
from IPython.display import Audio, display

# Don't show warnings in output
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Import the dataset
df = pd.read_csv('../Data/languoid.csv')

# Data clean up - keep required columns
df = df[df['level'] == 'language'] \
       [['name','child_dialect_count','latitude','longitude','status']] \
       .dropna()   # remove rows with missing entries

# Display top 5 rows
df.head()

In [None]:
# This cell will take atleast couple of minutes to run as we are trying to plot more than 7000 languages

# Define color scheme
colors = {'safe':'rgb(61,145,64)', 'vulnerable':'rgb(31,117,254)', 'definitely endangered':'rgb(137,207,240)', \
          'severely endangered':'rgb(255,191,0)', 'critically endangered':'rgb(255,0,0)', 'extinct':'rgb(0,0,0)'}

# Create a plotly figure object (like an empty figure)
fig = go.Figure()

# Create a marker with desired properties for each language in the dataset
for i in range(0,df.shape[0]):
    df_row = df.iloc[i]   # Pass each row in the dataset
    fig.add_trace(go.Scattergeo(
        lon = [df_row['longitude']],   # longitude
        lat = [df_row['latitude']],   # latitude 
        text = 'Dialects: {0} <br>Status: {1}'.format(df_row['child_dialect_count'],df_row['status']),   # text accompanying the dataset
        name = df_row['name'],   # name of the language
        hoverinfo = 'name + text',   # specify information to be shown when a pointer is hovered over a marker
        marker = dict(
            size=9,
            color=colors[df_row['status']],
            line_width=0),   # marker properties
        showlegend = False   # remove legend
        )
    )
    
# Update figure properties    
fig.update_layout(
    
    # Add title (see how hyperlink is added)
    title_text = 'Languages around the world<br>\
Source: <a href="http://www.unesco.org/languages-atlas/index.php?hl=en&page=atlasmap">\
UNESCO</a>',
    
    # Other geological properties of the map
    geo = dict(
        resolution=50,
        showcoastlines=True,
        showframe=False,
        showland=True,
        landcolor="lightgray",
        showcountries=True,
        countrycolor="white" ,
        coastlinecolor="white",
        bgcolor='rgba(255, 255, 255, 0.0)'
    )
)

# Show the figure
fig.show()

Hover over markers in the map and see the dialects and status of various languages. Feel free to zoom in/out as necessary to spot various languages in a country of your choice.

### Questions:

1. Which are the two continents with most safe languages?
2. Why languages disappear? Do you think globalization is a threat to linguistic diversity?
3. Is it possible to revive the language that is on the brink of extinction? Why?
4. How can you save a language under the influences of global connections?
5. More the dialects are, safer the language is. True or false? 

## Languages spoken within a country

Now that we know about various languages, let us dive a bit deeper. It would be helpful to analyze languages spoken in various countries and how it has changed over time. A demographic dataset from [United Nations](http://data.un.org/Data.aspx?d=POP&f=tableCode:27) is used here.

In [None]:
# Import the dataset
df2 = pd.read_csv('../Data/UNdata_Countrywise.csv')

df2 = df2.iloc[:-79] # Remove footnotes

# Display top 5 rows
df2.head()

In [None]:
from plotly.subplots import make_subplots

def show_pie_chart(ev):
    
    # Filter the data for given country and gender type
    to_be_removed = ['Unknown','Not stated','None','Total']   # Remove these categories
    df2_sub = df2[df2['Country or Area'] == country_menu.value] \
              [df2['Sex'] == gender_menu.value] \
              [df2['Area'] == 'Total'] \
              [~df2['Language'].isin(to_be_removed)]
    
    # Plot the pie chart
    if(df2_sub.shape[0] == 0): 
        print('Sorry ... data is not available :-(')   # Show comment if data is not available
    else:
        years = df2_sub['Year'].unique()[0:2]   # Show data for two latest years
        
        # Make subplots
        specs = [{'type':'domain'}, {'type':'domain'}]
        fig = make_subplots(rows=1, cols=len(years), specs=[specs[0:len(years)]])
        
        # Add trace for each year's pie chart
        for i,j in enumerate(years):
            new_df = df2_sub[df2_sub['Year'] == j]
            fig.add_trace(go.Pie(labels=new_df['Language'], values=new_df['Value'], name='Year<br>{}'.format(j), textinfo="none"),1,i+1) 
        
        # Use 'hole' to create a donut-like pie chart
        fig.update_traces(hole=.3, hoverinfo="label + percent + name")
        
        # Update the title of the figure
        fig.update_layout(title_text='Languages spoken in {} ({})<br>\
Source: <a href="http://data.un.org/Data.aspx?d=POP&f=tableCode:27">\
United Nations</a>'.format(country_menu.value,gender_menu.value))
    
        fig.show()

Run the cell below and select the country and gender you want to analyze the data for. Don't forget to click on `Show Pie Chart` button.

In [None]:
# Layout for widgets
box_layout = Layout(display='flex', flex_flow='row', align_items='center', width='100%', justify_content = 'center')
style = {'description_width': 'initial'}

# Create dropdown menu for Country and Gender
country_menu = widgets.Dropdown(options = df2['Country or Area'].unique(), description ='Country: ', style = style, disabled=False)
gender_menu = widgets.Dropdown(options = df2['Sex'].unique(), description ='Gender: ', style = style, disabled=False)

# Create Show Pie Chart button and define click events
show_button = widgets.Button(button_style= 'info', description="Show Pie Chart")
show_button.on_click(show_pie_chart)

# Define display order for the buttons and menus
display(Box(children = [country_menu, gender_menu], layout = box_layout))
display(VBox(children = [show_button], layout = Layout(display= 'flex', flex_flow= 'column', align_items= 'center', width='100%', justify_content = 'center')))

Each pie chart here is for a year in which data was collected. Move the curser around to see how much of the total population speak a particular language and how that pattern has changed over time!

### Questions:
1. List top 5 languages being used in Canada? Is there any foreign language in the list?
2. Has use of languages other than *English* and *French* increased among Canadians over time? 
3. Share your thoughts on whether globalization can/can't help in spreading languages across the borders.

## Exploring other languages

Let us learn bits and pieces of other languages. Here we will translate the words of your choice into the language you choose. The translation will be accompanied by an audio which teaches you how the translated word is pronounced.

Excited .. run the code cells below!

In [None]:
all_languages = gtts.lang.tts_langs()
all_languages = {v: k for k, v in all_languages.items()}

def translate_and_pronounce(ev):
    
    # Create a translator class (like a template) using Google Translator
    translator = googletrans.Translator()
    translation_data = translator.translate(textbox_input.value,\
                       dest=all_languages[languages_menu.value])   # Supply text to translate and corresponding language
    translation = translation_data.extra_data['translation'][0][0]   # Extract translated data
    
    # Extract pronunciation for the translated text
    if(len(translation_data.extra_data['translation']) == 2):
        pronunciation = translation_data.extra_data['translation'][1][-1]
    else:
        pronunciation = 'None - Speak as you read'   # For languages using Roman letters
       
    print('\n')
    
    # Create and display textbox for "Text" and dropdown menu for languages
    textbox_translation = widgets.Text(value = translation, description = 'Translation: ', disabled = True)
    textbox_pronunciation = widgets.Text(value = pronunciation, description = 'Pronunciation: ', disabled = True)
    display(Box(children = [textbox_translation, textbox_pronunciation], layout = box_layout))  
       
    # Create an audio file for the translated text using Google Text-to-Speech
    tts = gtts.gTTS(translation, lang=all_languages[languages_menu.value])
    tts.save('audio.mp3')
    
    # Display audio widget
    display(Audio('audio.mp3'))

In [None]:
# Layout for widgets
box_layout = Layout(display='flex', flex_flow='row', align_items='center', width='100%', justify_content = 'center')
style = {'description_width': 'initial'}

# Create textbox for "Text" and dropdown menu for languages
textbox_input = widgets.Text(value = '',description = 'Text: ', disabled = False)
languages_menu = widgets.Dropdown(options = list(all_languages.keys())[:-18], description ='Language: ', style = style, disabled=False)

# Create Translate button and define click events
show_button = widgets.Button(button_style= 'info', description="Translate")
show_button.on_click(translate_and_pronounce)

# Define display order for the buttons and menus
display(Box(children = [textbox_input, languages_menu], layout = box_layout))
display(VBox(children = [show_button], layout = Layout(display= 'flex', flex_flow= 'column', align_items= 'center', width='100%', justify_content = 'center')))

### Questions:

1. In the era of globalization, how important it is to learn languages other than your native language?
2. List languages that use similar scripts (alphabets) such as [Roman](https://en.wikipedia.org/wiki/Latin_script)/[Devanagari](https://en.wikipedia.org/wiki/Devanagari).

![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banners_Bottom_06.06.18.jpg?raw=true)