# Project "Dashboards for TED talks"

# Project Description
TED (from the English technology, education, design — "technology, education, design") is a non—profit foundation that holds popular conferences. Experts from different fields speak at them and give lectures on topical social, cultural and scientific topics.

In this project, we will explore the history of TED conferences using Tableau.

# Data Description
The tableau_project_data_1.csv, tableau_project_data_2.csv, tableau_project_data_3.csv files store performance data. They have the same structure:
- talk_id — speech ID;
- url — link to the recording of the speech;
- title — title of the speech;
- description — short description;
- film_date — date of recording the speech;
- duration — duration in seconds;
- views — number of views;
- main_tag — the main category to which the performance belongs;
- speaker_id — the unique identifier of the author of the speech;
- laughter_count — the number of times the audience laughed during the performance;
- applause_count — the number of times the audience applauded during the performance;
- language — the language in which the speech was conducted;
- event_id — unique conference ID.

The tableau_project_event_dict.csv file is a conference directory. Table Description:
- conf_id — unique conference ID;
- event — name of the conference;
- country — country of the conference.

The file tableau_project_speakers_dict.csv is a directory of the authors of the speech. Table Description:
- author_id — the unique identifier of the author of the speech;
- speaker_name — author's name;
- speaker_occupation — professional field of the author;
- speaker_description — description of the author's professional activity.

# Work plan:
1. Open the files and perform data preprocessing.
2. Go to Tableau and there we will build dashboards "History of speeches", "Topics of speeches", "Authors of speeches" and one dashboard on a free topic.
3. Create a presentation.

## Let 's open the files , study the general information and pre - process the data.

In [1]:
# Importing the libraries that we will need in this project
import pandas as pd
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt
import warnings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.options.display.max_colwidth = 400

In [2]:
# Let's read the data from the csv file into a dataframe and save it to a variable
try:
    data_1 = pd.read_csv('tableau_project_data_1.csv')
except:
    data_1 = pd.read_csv('https://code.s3.yandex.net/datasets/tableau_project_data_1.csv')

In [3]:
data_1.head(10)

Unnamed: 0,talk_id,url,title,description,film_date,duration,views,main_tag,speaker_id,laughter_count,applause_count,language,event_id
0,84216,https://www.ted.com/talks/christina_costa_how_gratitude_rewires_your_brain,How gratitude rewires your brain,"When a psychologist who studies well-being ends up with a brain tumor, what happens when she puts her own research into practice? Christina Costa goes beyond the ""fight"" narrative of cancer -- or any formidable personal journey -- to highlight the brain benefits of an empowering alternative to fostering resilience in the face of unexpected challenges: gratitude.",2021-03-27,600,718724,health,6625,0.0,0.0,English,309
1,66033,https://www.ted.com/talks/caitlin_holman_how_game_design_can_help_schooling,How game design can help schooling,The world is changing rapidly but models of decades-old schooling still influence educational systems in ways that leave students with few reasons to truly excel. Education researcher Caitlin Holman explains how game design principles engineered to keep players motivated can be co-opted to revitalize schools and improve educational outcomes.,2017-02-08,1043,46441,education,53443,,,English,309
2,21933,https://www.ted.com/talks/terri_conley_we_need_to_rethink_casual_sex,We need to rethink casual sex,"Social psychologist and sex researcher Terri Conley thinks it's high time we stop feeling guilty about enjoying casual sex—no matter what society says. In this entertaining talk, Conley interrogates three common myths about sexuality and gender and suggests a few new, guilt-free ways to think about our sex lives.",2016-04-01,1091,273438,society,5107,0.0,0.0,English,309
3,2022,https://www.ted.com/talks/anne_curzan_what_makes_a_word_real,"What makes a word ""real""?","One could argue that slang words like ‘hangry,’ ‘defriend’ and ‘adorkable’ fill crucial meaning gaps in the English language, even if they don't appear in the dictionary. After all, who actually decides which words make it into those pages? Language historian Anne Curzan gives a charming look at the humans behind dictionaries, and the choices they make.",2014-03-15,1033,2031550,culture,1938,12.0,4.0,English,309
4,83538,https://www.ted.com/talks/jane_walsh_the_rise_of_predatory_scams_and_how_to_prevent_them,The rise of predatory scams -- and how to prevent them,"Questionable phone calls, concerning emails, heart-rending stories from a sudden new friend in need of endless financial support: elder abuse can take many forms, says lawyer Jane Walsh. And as technology becomes more sophisticated, susceptibility to tricks and scams will increase -- no matter a person's age or intellect. Walsh spotlights the rise of this predatory crime, why it goes undetecte...",2021-06-26,833,802109,technology,6606,0.0,0.0,English,233
5,81821,https://www.ted.com/talks/rebecca_galemba_how_employers_steal_from_workers_and_get_away_with_it_sep_2021,How employers steal from workers -- and get away with it,"When you work, you expect to be paid for it. Except, for millions of Americans employed across a range of industries like restaurants and construction, that's not always the case. Anthropologist Rebecca Galemba explores the multibillion-dollar problem of wage theft and how employers get away with it, highlighting the changes needed for them to pay up -- and fairly.",2021-03-20,578,1199939,economics,6540,0.0,0.0,English,233
6,80115,https://www.ted.com/talks/kevin_j_krizek_how_covid_19_reshaped_us_cities,How COVID-19 reshaped US cities,"The pandemic spurred an unprecedented reclamation of urban space, ushering in a seemingly bygone era of pedestrian pastimes, as cars were sidelined in favor of citizens. Highlighting examples from across the United States, environmental designer Kevin J. Krizek reflects on how temporary shifts -- like transforming streets into places for dining, recreation and community -- can become permanent...",2021-03-20,581,1306282,cities,6484,,,English,233
7,74616,https://www.ted.com/talks/katherine_m_gehl_us_politics_isn_t_broken_it_s_fixed,US politics isn't broken. It's fixed,"The ""broken"" US political system is actually working exactly as designed, says business leader and activist Katherine Gehl. Examining the system through a nonpartisan lens, she makes the case for voting innovations, already implemented in parts of the country, that give citizens more choice and incentivize politicians to work towards progress and solutions instead of just reelection.",2020-12-05,1009,1264323,social change,6306,0.0,0.0,English,233
8,74282,https://www.ted.com/talks/amber_mcreynolds_an_election_system_that_puts_voters_not_politicians_first,An election system that puts voters (not politicians) first,"From hours-long lines and limited polling locations to confusing and discriminatory registration policies, why is it so hard to vote in the US? Voting rights expert Amber McReynolds offers a proven alternative: a new process, already happening in parts of the country, that could bring accountability, transparency and equity to the outdated and sputtering system that American democracy currentl...",2020-08-29,619,1288197,society,6301,0.0,0.0,English,233
9,75168,https://www.ted.com/talks/joan_c_williams_why_corporate_diversity_programs_fail_and_how_small_tweaks_can_have_big_impact,Why corporate diversity programs fail -- and how small tweaks can have big impact,"Companies in the US spend billions of dollars each year on diversity, equity and inclusion initiatives, but subtle (and not so subtle) workplace biases often cost these initiatives -- and the people they're meant to help -- big time by undermining their goals. DEI expert Joan C. Williams identifies five common patterns of bias that cause these programs to fail -- and offers a data-driven appro...",2020-12-05,883,1252694,business,6314,0.0,0.0,English,233


In [4]:
data_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1152 entries, 0 to 1151
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   talk_id         1152 non-null   int64  
 1   url             1152 non-null   object 
 2   title           1152 non-null   object 
 3   description     1152 non-null   object 
 4   film_date       1152 non-null   object 
 5   duration        1152 non-null   int64  
 6   views           1152 non-null   int64  
 7   main_tag        1151 non-null   object 
 8   speaker_id      1152 non-null   int64  
 9   laughter_count  1003 non-null   float64
 10  applause_count  1003 non-null   float64
 11  language        1152 non-null   object 
 12  event_id        1152 non-null   int64  
dtypes: float64(2), int64(5), object(6)
memory usage: 117.1+ KB


### Let's check how many gaps there are in our table

In [5]:
data_1.isna().sum()

talk_id             0
url                 0
title               0
description         0
film_date           0
duration            0
views               0
main_tag            1
speaker_id          0
laughter_count    149
applause_count    149
language            0
event_id            0
dtype: int64

In [6]:
pd.DataFrame(data_1.isna().mean()*100).style.set_precision(1).background_gradient('coolwarm')

Unnamed: 0,0
talk_id,0.0
url,0.0
title,0.0
description,0.0
film_date,0.0
duration,0.0
views,0.0
main_tag,0.1
speaker_id,0.0
laughter_count,12.9


**Conclusion: it is not possible to restore the data in the laughter_count and applause_count columns, so you will have to leave them unchanged. The gaps in the main_tag column can be removed due to the small number.**

In [7]:
data_1 = data_1.dropna(subset=['main_tag'])

In [8]:
data_1.isna().sum()

talk_id             0
url                 0
title               0
description         0
film_date           0
duration            0
views               0
main_tag            0
speaker_id          0
laughter_count    149
applause_count    149
language            0
event_id            0
dtype: int64

### Let's check how many explicit duplicates there are in our table

In [9]:
data_1.duplicated().sum()

0

**Conclusion: there are no obvious duplicates.**

### Let's check how many implicit duplicates there are in our table

In [10]:
data_1['main_tag'].unique()

array(['health', 'education', 'society', 'culture', 'technology',
       'economics', 'cities', 'social change', 'business', 'history',
       'future', 'nature', 'global issues', 'science', 'architecture',
       'communication', 'community', 'humanity', 'identity',
       'decision-making', 'personal growth', 'creativity', 'psychology',
       'design', 'activism', 'politics', 'relationships', 'space', 'art',
       'performance', 'work', 'entertainment', 'animation', 'math',
       'medicine', 'environment', 'ocean', 'innovation', 'self', 'brain',
       'biology', 'ted fellows', 'film', 'ai', 'humor', 'violence',
       'data', 'health care', 'food', 'gender', 'sustainability'],
      dtype=object)

In [11]:
data_1['language'].unique()

array(['English', 'French', 'Spanish'], dtype=object)

**Conclusion: no implicit duplicates were found.**

### Save the modified file to continue working with it in Tableau

In [12]:
data_1.to_csv('tableau_project_data_11.csv', index=False)

In [13]:
# Let's read the data from the csv file into a dataframe and save it to a variable
try:
    data_2 = pd.read_csv('tableau_project_data_2.csv')
except:
    data_2 = pd.read_csv('https://code.s3.yandex.net/datasets/tableau_project_data_2.csv')

In [14]:
data_2.head(10)

Unnamed: 0,talk_id,url,title,description,film_date,duration,views,main_tag,speaker_id,laughter_count,applause_count,language,event_id
0,21936,https://www.ted.com/talks/leila_seth_why_i_defend_women_s_inheritance_rights,Why I defend women's inheritance rights,"During her stint on the Law Commission of India, Leila Seth, the first woman to be appointed to the Delhi High Court as a judge, led a campaign to grant women inheritance rights to ancestral property. In this passionate talk, she discusses the experiences that motivated her to defend the property rights and financial independence of women in a patriarchal society like India's.",2015-05-29,956,707770,social change,20191,6.0,6.0,English,194
1,2483,https://www.ted.com/talks/aditi_gupta_a_taboo_free_way_to_talk_about_periods,A taboo-free way to talk about periods,"It's true: talking about menstruation makes many people uncomfortable. And that taboo has consequences: in India, three out of every 10 girls don't even know what menstruation is at the time of their first period, and restrictive customs related to periods inflict psychological damage on young girls. Growing up with this taboo herself, Aditi Gupta knew she wanted to help girls, parents and tea...",2015-05-29,670,1828032,education,2959,5.0,6.0,English,194
2,19995,https://www.ted.com/talks/boy_girl_banjo_dead_romance,"""Dead Romance""","Acoustic duo Anielle Reid and Matthew Brookshire (playing together as Boy Girl Banjo) take the TED stage to perform their original song ""Dead Romance,"" weaving together the sounds of Americana folk music and modern pop.",2017-11-16,222,312226,performance,3757,0.0,0.0,English,97
3,12571,https://www.ted.com/talks/drew_philp_my_500_house_in_detroit_and_the_neighbors_who_helped_me_rebuild_it,My $500 house in Detroit -- and the neighbors who helped me rebuild it,"In 2009, journalist and screenwriter Drew Philp bought a ruined house in Detroit for $500. In the years that followed, as he gutted the interior and removed the heaps of garbage crowding the rooms, he didn't just learn how to repair a house -- he learned how to build a community. In a tribute to the city he loves, Philp tells us about ""radical neighborliness"" and makes the case that we have ""t...",2017-11-16,823,1384861,culture,3763,1.0,1.0,English,97
4,13000,https://www.ted.com/talks/andrew_dent_to_eliminate_waste_we_need_to_rediscover_thrift,"To eliminate waste, we need to rediscover thrift","There's no such thing as throwing something away, says Andrew Dent -- when you toss a used food container, broken toy or old pair of socks into the trash, those things inevitably end up in ever-growing landfills. But we can get smarter about the way we make, and remake, our products. Dent shares exciting examples of thrift -- the idea of using and reusing what you need so you don't have to pur...",2017-11-16,634,1451874,technology,3761,0.0,1.0,English,97
5,10802,https://www.ted.com/talks/nilay_kulkarni_a_life_saving_invention_that_prevents_human_stampedes,A life-saving invention that prevents human stampedes,"Every three years, more than 30 million Hindu worshippers gather for the Kumbh Mela in India, the world's largest religious gathering, in order to wash away their sins. With massive crowds descending on small cities and towns, stampedes inevitably happen, and in 2003, 39 people were killed during the festival. In 2014, then 15-year-old Nilay Kulkarni decided to put his skills as a self-taught ...",2018-01-24,465,1092617,design,3913,6.0,2.0,English,97
6,9473,https://www.ted.com/talks/danielle_wood_6_space_technologies_we_can_use_to_improve_life_on_earth,6 space technologies we can use to improve life on Earth,"Danielle Wood leads the Space Enabled research group at the MIT Media Lab, where she works to tear down the barriers that limit the benefits of space exploration to only the few, the rich or the elite. She identifies six technologies developed for space exploration that can contribute to sustainable development across the world -- from observation satellites that provide information to aid org...",2017-11-16,651,1393328,science,3758,0.0,1.0,English,97
7,3612,https://www.ted.com/talks/victoria_pratt_how_judges_can_show_respect,How judges can show respect,"In halls of justice around the world, how can we ensure everyone is treated with dignity and respect? A pioneering judge in New Jersey, Victoria Pratt shares her principles of ""procedural justice"" -- four simple, thoughtful steps that redefined the everyday business of her courtroom in Newark, changing lives along the way. ""When the court behaves differently, naturally people respond different...",2016-10-05,963,1229338,culture,2905,8.0,1.0,English,97
8,2790,https://www.ted.com/talks/sofi_tukker_awoo,"""Awoo""","Electro-pop duo Sofi Tukker dance it out with the TED audience in a performance of their upbeat, rhythmic song ""Awoo,"" featuring Betta Lemme.",2017-03-08,212,1964046,entertainment,3364,2.0,3.0,English,97
9,2810,https://www.ted.com/talks/sinead_burke_why_design_should_include_everyone,Why design should include everyone,"Sinéad Burke is acutely aware of details that are practically invisible to many of us. At 105 centimeters (or 3' 5"") tall, the designed world -- from the height of a lock to the range of available shoe sizes -- often inhibits her ability to do things for herself. Here she tells us what it's like to navigate the world as a little person and asks: ""Who are we not designing for?""",2017-03-08,597,1556219,design,3363,2.0,1.0,English,97


In [15]:
data_2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1152 entries, 0 to 1151
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   talk_id         1152 non-null   int64  
 1   url             1152 non-null   object 
 2   title           1152 non-null   object 
 3   description     1152 non-null   object 
 4   film_date       1152 non-null   object 
 5   duration        1152 non-null   int64  
 6   views           1152 non-null   int64  
 7   main_tag        1151 non-null   object 
 8   speaker_id      1152 non-null   int64  
 9   laughter_count  1013 non-null   float64
 10  applause_count  1013 non-null   float64
 11  language        1152 non-null   object 
 12  event_id        1152 non-null   int64  
dtypes: float64(2), int64(5), object(6)
memory usage: 117.1+ KB


### Let's check how many gaps there are in our table

In [16]:
data_2.isna().sum()

talk_id             0
url                 0
title               0
description         0
film_date           0
duration            0
views               0
main_tag            1
speaker_id          0
laughter_count    139
applause_count    139
language            0
event_id            0
dtype: int64

In [17]:
pd.DataFrame(data_2.isna().mean()*100).style.set_precision(1).background_gradient('coolwarm')

Unnamed: 0,0
talk_id,0.0
url,0.0
title,0.0
description,0.0
film_date,0.0
duration,0.0
views,0.0
main_tag,0.1
speaker_id,0.0
laughter_count,12.1


**Conclusion: it is not possible to restore the data in the laughter_count and applause_count columns, so you will have to leave them unchanged. The gaps in the main_tag column can be removed due to the small number.**

In [18]:
data_2 = data_2.dropna(subset=['main_tag'])

### Let's check how many explicit duplicates there are in our table

In [19]:
data_2.duplicated().sum()

0

**Conclusion: there are no obvious duplicates.**

### Let's check how many implicit duplicates there are in our table

In [20]:
data_2['main_tag'].unique()

array(['social change', 'education', 'performance', 'culture',
       'technology', 'design', 'science', 'entertainment', 'society',
       'global issues', 'health', 'activism', 'innovation', 'data',
       'business', 'humanity', 'humor', 'computers', 'health care',
       'africa', 'biology', 'brain', 'math', 'music', 'sustainability',
       'economics', 'personal growth', 'art', 'environment', 'history',
       'invention', 'animation', 'communication', 'ted fellows',
       'politics', 'equality', 'space', 'creativity', 'money', 'nature',
       'disability', 'internet', 'work', 'community', 'women', 'identity',
       'public health', 'government', 'ocean', 'writing', 'violence',
       'storytelling', 'philosophy', 'future', 'collaboration', 'cities',
       'psychology', 'engineering', 'neuroscience', 'policy', 'medicine',
       'natural disaster', 'animals', 'happiness', 'decision-making',
       'climate change', 'kids', 'exploration'], dtype=object)

In [21]:
data_2['language'].unique()

array(['English'], dtype=object)

**Conclusion: no implicit duplicates were found.**

### Save the modified file to continue working with it in Tableau

In [22]:
data_2.to_csv('tableau_project_data_22.csv', index=False)

In [23]:
# Let's read the data from the csv file into a dataframe and save it to a variable
try:
    data_3 = pd.read_csv('tableau_project_data_3.csv')
except:
    data_3 = pd.read_csv('https://code.s3.yandex.net/datasets/tableau_project_data_3.csv')

In [24]:
data_3.head(10)

Unnamed: 0,talk_id,url,title,description,film_date,duration,views,main_tag,speaker_id,laughter_count,applause_count,language,event_id
0,1841,https://www.ted.com/talks/andrew_fitzgerald_adventures_in_twitter_fiction,Adventures in Twitter fiction,"In the 1930s, broadcast radio introduced an entirely new form of storytelling; today, micro-blogging platforms like Twitter are changing the scene again. Andrew Fitzgerald takes a look at the (aptly) short but fascinating history of new forms of creative experimentation in fiction and storytelling.",2013-07-15,715,1049942,technology,1689,1.0,1.0,English,112
1,1817,https://www.ted.com/talks/jake_barton_the_museum_of_you,The museum of you,"A third of the world watched live as the World Trade Center collapsed on September 11, 2001; a third more heard about it within 24 hours. (Do you remember where you were?) So exhibits at the soon-to-open 9/11 Memorial Museum will reflect the diversity of the world's experiences of that day. In a moving talk, designer Jake Barton gives a peek at some of those installations, as well as several o...",2013-05-16,938,831812,design,1679,1.0,1.0,English,112
2,1776,https://www.ted.com/talks/bob_mankoff_anatomy_of_a_new_yorker_cartoon,Anatomy of a New Yorker cartoon,"The New Yorker receives around 1,000 cartoons each week; it only publishes about 17 of them. In this hilarious, fast-paced, and insightful talk, the magazine's longstanding cartoon editor and self-proclaimed ""humor analyst"" Bob Mankoff dissects the comedy within just some of the ""idea drawings"" featured in the magazine, explaining what works, what doesn't, and why.",2013-05-16,1259,1392017,design,1660,27.0,2.0,English,112
3,1743,https://www.ted.com/talks/jay_silver_hack_a_banana_make_a_keyboard,"Hack a banana, make a keyboard!","Why can't two slices of pizza be used as a slide clicker? Why shouldn't you make music with ketchup? In this charming talk, inventor Jay Silver talks about the urge to play with the world around you. He shares some of his messiest inventions, and demos MaKey MaKey, a kit for hacking everyday objects.",2013-04-24,795,1474788,technology,1650,5.0,3.0,English,112
4,1752,https://www.ted.com/talks/paola_antonelli_why_i_brought_pac_man_to_moma,Why I brought Pac-Man to MoMA,"When the Museum of Modern Art's senior curator of architecture and design announced the acquisition of 14 video games in 2012, ""all hell broke loose."" In this far-ranging, entertaining, and deeply insightful talk, Paola Antonelli explains why she's delighted to challenge preconceived ideas about art and galleries, and describes her burning wish to help establish a broader understanding of design.",2013-05-16,1122,1023829,design,148,8.0,2.0,English,112
5,1669,https://www.ted.com/talks/esther_perel_the_secret_to_desire_in_a_long_term_relationship,The secret to desire in a long-term relationship,"In long-term relationships, we often expect our beloved to be both best friend and erotic partner. But as Esther Perel argues, good and committed sex draws on two conflicting needs: our need for security and our need for surprise. So how do you sustain desire? With wit and eloquence, Perel lets us in on the mystery of erotic intelligence.",2013-02-11,1150,18952508,culture,1542,8.0,2.0,English,112
6,1675,https://www.ted.com/talks/bruce_feiler_agile_programming_for_your_family,Agile programming -- for your family,"Bruce Feiler has a radical idea: To deal with the stress of modern family life, go agile. Inspired by agile software programming, Feiler introduces family practices which encourage flexibility, bottom-up idea flow, constant feedback and accountability. One surprising feature: Kids pick their own punishments.",2013-02-11,1080,1749935,culture,939,2.0,2.0,English,112
7,1783,https://www.ted.com/talks/mohamed_hijri_a_simple_solution_to_the_coming_phosphorus_crisis,A simple solution to the coming phosphorus crisis,"There's a farming crisis no one is talking about: The world is running out of phosphorus, an essential element that's a key component of DNA and the basis of cellular communication. As biologist Mohamed Hijri shows, all roads of this crisis lead back to how we farm -- with chemical fertilizers chock-full of the element, which plants are not efficient at absorbing. One solution? A microscopic m...",2013-10-03,821,678494,science,1698,0.0,1.0,English,307
8,1830,https://www.ted.com/talks/kevin_breel_confessions_of_a_depressed_comic,Confessions of a depressed comic,"Kevin Breel didn't look like a depressed kid: team captain, at every party, funny and confident. But he tells the story of the night he realized that -- to save his own life -- he needed to say four simple words.",2013-05-05,660,4733198,mental health,1682,0.0,2.0,English,216
9,1797,https://www.ted.com/talks/tania_luna_how_a_penny_made_me_feel_like_a_millionaire,How a penny made me feel like a millionaire,"As a young child, Tania Luna left her home in post-Chernobyl Ukraine to take asylum in the US. And one day, on the floor of the New York homeless shelter where she and her family lived, she found a penny. She has never again felt so rich. A meditation on the bittersweet joys of childhood -- and how to hold them in mind.",2012-07-12,331,1855539,culture,1669,0.0,1.0,English,63


In [25]:
data_3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1152 entries, 0 to 1151
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   talk_id         1152 non-null   int64  
 1   url             1152 non-null   object 
 2   title           1152 non-null   object 
 3   description     1152 non-null   object 
 4   film_date       1152 non-null   object 
 5   duration        1152 non-null   int64  
 6   views           1152 non-null   int64  
 7   main_tag        1152 non-null   object 
 8   speaker_id      1152 non-null   int64  
 9   laughter_count  1123 non-null   float64
 10  applause_count  1123 non-null   float64
 11  language        1152 non-null   object 
 12  event_id        1152 non-null   int64  
dtypes: float64(2), int64(5), object(6)
memory usage: 117.1+ KB


### Let's check how many gaps there are in our table

In [26]:
data_3.isna().sum()

talk_id            0
url                0
title              0
description        0
film_date          0
duration           0
views              0
main_tag           0
speaker_id         0
laughter_count    29
applause_count    29
language           0
event_id           0
dtype: int64

In [27]:
pd.DataFrame(data_3.isna().mean()*100).style.set_precision(1).background_gradient('coolwarm')

Unnamed: 0,0
talk_id,0.0
url,0.0
title,0.0
description,0.0
film_date,0.0
duration,0.0
views,0.0
main_tag,0.0
speaker_id,0.0
laughter_count,2.5


**Conclusion: it is not possible to restore the data in the laughter_count and applause_count columns, so you will have to leave them unchanged.**

### Let's check how many explicit duplicates there are in our table

In [28]:
data_3.duplicated().sum()

0

**Conclusion: there are no obvious duplicates.**

### Let's check how many implicit duplicates there are in our table

In [29]:
data_3['main_tag'].unique()

array(['technology', 'design', 'culture', 'science', 'mental health',
       'business', 'entertainment', 'global issues', 'creativity', 'art',
       'education', 'history', 'brain', 'social change', 'health care',
       'animals', 'identity', 'innovation', 'society', 'space',
       'engineering', 'health', 'economics', 'medicine', 'robots',
       'transportation', 'performance', 'environment', 'nature',
       'community', 'communication', 'data', 'storytelling', 'activism',
       'biology', 'invention', 'personal growth', 'visualizations',
       'africa', 'politics', 'math', 'kids', 'internet', 'animation',
       'humor', 'medical research', 'sustainability', 'humanity', 'work',
       'writing'], dtype=object)

In [30]:
data_3['language'].unique()

array(['English'], dtype=object)

**Conclusion: no implicit duplicates were found.**

### Save the modified file to continue working with it in Tableau

In [31]:
data_3.to_csv('tableau_project_data_33.csv', index=False)

In [32]:
# Let's read the data from the csv file into a dataframe and save it to a variable
try:
    conferences = pd.read_csv('tableau_project_event_dict.csv')
except:
    conferences = pd.read_csv('https://code.s3.yandex.net/datasets/tableau_project_event_dict.csv')

In [33]:
conferences.head(10)

Unnamed: 0,conf_id,event,country
0,0,Arbejdsglaede Live,United States
1,1,Business Innovation Factory,United States
2,2,Chautauqua Institution,United States
3,3,DLD 2007,United States
4,4,EG 2007,United States
5,5,EG 2008,United States
6,6,Elizabeth G. Anderson School,United States
7,7,Full Spectrum Auditions,United States
8,8,INK Conference,United States
9,9,LIFT 2007,United States


In [34]:
conferences.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 327 entries, 0 to 326
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   conf_id  327 non-null    int64 
 1   event    327 non-null    object
 2   country  327 non-null    object
dtypes: int64(1), object(2)
memory usage: 7.8+ KB


### Let's check how many gaps there are in our table

In [35]:
conferences.isna().sum()

conf_id    0
event      0
country    0
dtype: int64

**Conclusion: no gaps were detected.**

### Let's check how many explicit duplicates there are in our table

In [36]:
conferences.duplicated().sum()

0

**Conclusion: no obvious duplicates were found.**

### Let's check how many implicit duplicates there are in our table

In [37]:
conferences['country'].unique()

array(['United States', 'Ecuador', 'Canada', 'Germany', 'UK', 'France',
       'Singapore', 'India', 'South Africa', 'Kenya', 'Tanzania',
       'Brazil', 'Switzerland', 'Nigeria', 'Netherlands', 'Norway',
       'Greece', 'Lebanon', 'Slovakia', 'Belgium', 'Australia', 'Hungary',
       'UAE', 'Ireland', 'Sweden', 'Israel', 'Poland', 'Japan',
       'Trinidad and Tobago', 'Argentina', 'Mexico', 'South Korea',
       'Austria'], dtype=object)

**Conclusion: no implicit duplicates were found. It makes no sense to save any changes in this file.**

In [38]:
# Let's read the data from the csv file into a dataframe and save it to a variable
try:
    speakers = pd.read_csv('tableau_project_speakers_dict.csv')
except:
    speakers = pd.read_csv('https://code.s3.yandex.net/datasets/tableau_project_speakers_dict.csv')

In [39]:
speakers.head(10)

Unnamed: 0,author_id,speaker_name,speaker_occupation,speaker_description
0,2,Al Gore,Climate advocate,Nobel Laureate Al Gore focused the world's attention on the global climate crisis.
1,3,Amy Smith,inventor,"Amy Smith designs cheap, practical fixes for tough problems in developing countries. Among her many accomplishments, the MIT engineer received a MacArthur ""genius"" grant in 2004 and was the first woman to win the Lemelson-MIT Prize for turning her ideas into inventions."
2,4,Ashraf Ghani,President of Afghanistan,"Afghanistan's president Ashraf Ghani has initiated sweeping economic, trade, social and peace reforms."
3,5,Burt Rutan,Aircraft engineer,"In 2004, legendary spacecraft designer Burt Rutan won the $10M Ansari X-Prize for <em>SpaceShipOne,</em> the first privately funded craft to enter space twice in a two-week period. He's now collaborating with Virgin Galactic to build the first rocketship for space tourism."
4,6,Chris Bangle,Car designer,"Car design is a ubiquitous but often overlooked art form. As chief of design for the BMW Group, Chris Bangle has overseen cars that have been seen the world over, including BMW 7 Series and the Z4 roadster."
5,7,Craig Venter,Biologist,"In 2001, Craig Venter made headlines for sequencing the human genome. In 2003, he started mapping the ocean's biodiversity. And now he's created the first synthetic lifeforms -- microorganisms that can produce alternative fuels."
6,8,David Pogue,Technology columnist,"David Pogue is the personal technology columnist for the <em>New York Times</em> and a tech correspondent for CBS News. He's also one of the world's bestselling how-to authors, with titles in the For Dummies series and his own line of ""Missing Manual"" books."
7,9,David Rockwell,Architect,"Architect David Rockwell draws on his love of drama and spectacle to create fantastic, high-impact restaurants, cultural facilities, airline terminals, theater sets -- and playgrounds."
8,10,Dean Kamen,Inventor,"Dean Kamen landed in the limelight with the Segway, but he has been innovating since high school, with more than 150 patents under his belt. Recent projects include portable energy and water purification for the developing world, and a prosthetic arm for maimed soldiers."
9,11,Dean Ornish,Physician,Dean Ornish is a clinical professor at UCSF and founder of the Preventive Medicine Research Institute. He's a leading expert on fighting illness -- particularly heart disease with dietary and lifestyle changes.


In [40]:
speakers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2971 entries, 0 to 2970
Data columns (total 4 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   author_id            2971 non-null   int64 
 1   speaker_name         2971 non-null   object
 2   speaker_occupation   2971 non-null   object
 3   speaker_description  2958 non-null   object
dtypes: int64(1), object(3)
memory usage: 93.0+ KB


### Let's check how many gaps there are in our table

In [41]:
speakers.isna().sum()

author_id               0
speaker_name            0
speaker_occupation      0
speaker_description    13
dtype: int64

In [42]:
pd.DataFrame(speakers.isna().mean()*100).style.set_precision(1).background_gradient('coolwarm')

Unnamed: 0,0
author_id,0.0
speaker_name,0.0
speaker_occupation,0.0
speaker_description,0.4


**Conclusion: it is not possible to restore the data in the speaker_description column, so we will delete them.**

In [43]:
speakers = speakers.dropna(subset=['speaker_description'])

### Let's check how many explicit duplicates there are in our table

In [44]:
speakers.duplicated().sum()

0

**Conclusion: no obvious duplicates were found.**

### Save the modified file to continue working with it in Tableau

In [45]:
speakers.to_csv('tableau_project_speakers_dict_1.csv', index=False)

## Steps 2 and 3 were performed in Tableau. Link to the presentation:https://public.tableau.com/app/profile/aleksei.pirozhkov/viz/Book1_16923614362800/sheet1?publish=yes

# Project conclusions:
1. Performances took place most often in the USA and Canada
2. The most popular categories are science, technology, culture, society.
3. In 2020-2021, the number of speeches decreased sharply (apparently due to the coronavirus epidemic), the science and culture categories ceased to be popular, and the popularity of the global issues and society categories increased significantly.
4. Science, culture, technology, - the most popular categories
5. There are no global differences in the distribution of popular categories by country.  It can be noted that global issues are much more popular in England than in Canada and the USA. And society, on the contrary, is much more popular in Canada and the USA than in England.
6. On average, the entertainment and social change categories get the most applause. Cecile Richards' speech "The political progress women have made - and what's next" drew the most applause.
7. On average, the categories of education and business cause the most laughter. At the same time, Tom Rielly's performance "A comic send up of TED2006" from the culture category caused the most laughter.
8. There is no direct connection between the duration of the performance and the number of views.
9. The performance "Do schools kill creativity?" by Sir Ken Robinson has the most views.
10. The longest performance is "3 secrets of Netflix's success", by Reed Hastings.
11. The most popular fields of activity for authors are Writer, Artist, Author.
12. Most authors speak once.
13. Among the designers, Tom Wujec performed most often.
14. Among all Tom Wujec's performances, the performance "Learn to use 13th-century astrolabe has the least views.