# Who Is J?

## Analysing JOTB diversity network 

One of the main goals of the ‘Yes We Tech’ community is contributing to create an inclusive space where we can celebrate diversity, provide visibility to women-in-tech, and ensure that everybody has an equal chance to learn, share and enjoy technology-related disciplines.

As co-organisers of the event, we have concentrated our efforts in getting more women speakers on board under the assumption that a more diverse panel would enrich the conversation also around technology.

Certainly, we have doubled the number of women giving talks this year, but, is this diversity enough? How can we know that we have succeeded in our goal? and more importantly, what can we learn to create a more diverse event in future editions?

The work that we are sharing here talks about two things: data and people. Both data and people should help us to find out some answers and understand the reasons why.

Let's start with a story about data. Data is pretty simple compared with people. Just take a look at the numbers, the small ones, the ones that better describe what happened in 2016 and 2017 J On The Beach editions.

In [185]:
import pandas as pd
import numpy as np
import scipy as sp
import pygal
from iplotter import GCPlotter

plotter = GCPlotter()

### Small data analysis

Small data says that last year, our 'J' engaged up to 48 speakers and 299 attendees into this big data thing. 
I'm not considering here any member of the organisation.

In [210]:
data2016 = pd.read_csv('../input/small_data_2016.csv')
data2016['Women Rate'] = pd.Series(data2016['Women']*100/data2016['Total'])
data2016['Men Rate'] = pd.Series(data2016['Men']*100/data2016['Total'])
data2016

Unnamed: 0,Tribe,Women,Men,Total,Women Rate,Men Rate
0,speakers,5,43,48,10.416667,89.583333
1,attendees,39,260,299,13.043478,86.956522
2,independent,8,44,52,15.384615,84.615385
3,company_teams,28,214,242,11.570248,88.429752
4,company_teams_no_women,0,99,99,0.0,100.0
5,hackathon,0,0,0,,


This year speakers are 40, few less than last year, while participation have reached the number of 368 people. (Compare the increment of attendees 368 vs 299

In [211]:
data2017 = pd.read_csv('../input/small_data_2017.csv')
data2017['Women Rate'] = pd.Series(data2017['Women']*100/data2017['Total'])
data2017['Men Rate'] = pd.Series(data2017['Men']*100/data2017['Total'])
data2017

Unnamed: 0,Tribe,Women,Men,Total,Women Rate,Men Rate
0,speakers,11,29,40,27.5,72.5
1,attendees,36,332,368,9.782609,90.217391
2,independent,6,65,71,8.450704,91.549296
3,copmany_teams,30,267,297,10.10101,89.89899
4,company_teams_no_women,0,134,134,0.0,100.0
5,hackathon,4,21,25,16.0,84.0


In [8]:
increase = 100 - 299*100.00/368
increase

18.75

It is noticable also, that big data is bigger than ever and this year we have included workshops and a hackathon.
  
The more the better right? Let's continue because there are more numbers behind those ones. Numbers that will give us some signs of diversity.

#### Diversity

When it comes about speakers, this year we have a **27.5%** of women speaking to J, compared with a rough **10.4%** of the last year.

In [212]:
data = [
    ['Tribe', 'Women', 'Men', {"role": 'annotation'}],
    ['2016', data2016['Women Rate'][0], data2016['Men Rate'][0],''],
    ['2017', data2017['Women Rate'][0], data2017['Men Rate'][0],''],
]
options = {
    "title": 'Speakers at JOTB',
    "width": 600,
    "height": 400,
    "legend": {"position": 'top', "maxLines": 3},
    "bar": {"groupWidth": '50%'},
    "isStacked": "true",
    "colors": ['#984e9e', '#ed1c40'],
}

plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)

However, and this is the worrying thing, the participation of women as attendees has slightly dropped from a not too ambitious **13%** to a disappointing **9.8%**. So we have an x% more of attendees but zero impact on a wider variaty of people.

In [213]:
data = [
    ['Tribe', 'Women', 'Men', {"role": 'annotation'}],
    ['2016', data2016['Women Rate'][1], data2016['Men Rate'][1],''],
    ['2017', data2017['Women Rate'][1], data2017['Men Rate'][1],''],
]
options = {
    "title": 'Attendees at JOTB',
    "width": 600,
    "height": 400,
    "legend": {"position": 'top', "maxLines": 3},
    "bar": {"groupWidth": '55%'},
    "isStacked": "true",
    "colors": ['#984e9e', '#ed1c40'],
}

plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)

#### Why this happened? 

We don’t really know. But we continued looking at the numbers and realised that **30** of the **45** companies that enrolled two or more people didn't include any women on their lists. Meaning a **31%** of the mass of attendees. Correlate team size with women percentage to validate if: the smaller the teams are, the less chances to include a women on their lists

In [214]:
companies_team = data2017['Total'][3] + data2017['Total'][4]
mass_represented = pd.Series(data2017['Total'][4]*100/companies_team)
women_represented = pd.Series(100 - mass_represented)
mass_represented

0    31
dtype: int64

For us this is not a good sign. Despite the fact that our ability to summon has increased on our monthly meetups (the ones that attempts to create this culture for equality on Málaga), the engagement on other events doesn’t have a big impact.

Again I'm not blaming companies here, because if we try to identify the participation rate of women who are not part of a team, the representation also decreased almost a **50%**.

In [215]:
data = [
    ['Tribe', 'Women', 'Men', {"role": 'annotation'}],
    [data2016['Tribe'][2], data2016['Women Rate'][2], data2016['Men Rate'][2],''],
    [data2016['Tribe'][3], data2016['Women Rate'][3], data2016['Men Rate'][3],''],
    [data2016['Tribe'][5], data2016['Women Rate'][5], data2016['Men Rate'][5],''],
]
options = {
    "title": '2016 JOTB Edition',
    "width": 600,
    "height": 400,
    "legend": {"position": 'top', "maxLines": 3},
    "bar": {"groupWidth": '55%'},
    "isStacked": "true",
    "colors": ['#984e9e', '#ed1c40'],
}

plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)

In [23]:
data = [
    ['Tribe', 'Women', 'Men', {"role": 'annotation'}],
    [data2017['Tribe'][2], data2017['Women Rate'][2], data2017['Men Rate'][2],''],
    [data2017['Tribe'][3], data2017['Women Rate'][3], data2017['Men Rate'][3],''],
    [data2017['Tribe'][5], data2017['Women Rate'][5], data2017['Men Rate'][5],''],
]
options = {
    "title": '2017 JOTB Edition',
    "width": 600,
    "height": 400,
    "legend": {"position": 'top', "maxLines": 3},
    "bar": {"groupWidth": '55%'},
    "isStacked": "true",
    "colors": ['#984e9e', '#ed1c40'],
}

plotter.plot(data,chart_type='ColumnChart',chart_package='corechart', options=options)

Before before blaming anyone or falling to quickly into self-indulgence, there are still more data to play with.

Note aside: the next thing is nothing but an experiment, nothing is categorical or has been made with the intention of offending any body. Like our t-shirt labels says: no programmer have been injured in the creation of the following data game.

# Social network analysis
The next story talks about people. The people around J, the ones who follow, are followed by, interact with, and create the chances of a more diverse and interesting conference. 

It is also a story about the people who organise this conference. Because when we started to plan a conference like this, we did nothing but thinking on what could be interesting for the people who come. In order to get that we used the previous knowledge that we have about cool people who do amazing things with data, and JVM technologies. And this means looking into our own networks and following suggestions of the people we trust. 

So if we assume that we are biased by the people around us, we thought it was a good idea to know first how is the network of people around J to see the chances that we have to bring someone different, unusual that can add value to the conference.

For the moment, since this is an experiment that wants to trigger your reaction we will look at J's Twitter account.

Indeed, a real-world network would have a larger amount of numbers and people to look at, but yet a digital social network is about human interactions, conversations and knowledge sharing. 

For this experiment we've used `sexmachine` python library https://pypi.python.org/pypi/SexMachine/ and the 'Twitter Gender Distribution' project published in github https://github.com/ajdavis/twitter-gender-distribution to find out the gender of a specific twitter acount.

In [319]:
run index.py jotb17

  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)


From the small **50%** of J's friends that could be identified with a gender, the distribution woman/men is a **20/80**. Friends are the ones who follow and are followed by J.

In [158]:
whoisj = pd.read_json('../out/jotb17.json', orient = 'columns')
whoisj['jotb17']['female_rate']
whoisj['jotb17']['male_rate']
whoisj['jotb17']['nonbinary_rate']
whoisj

Unnamed: 0,jotb17
favourites_count,1272
female_count,175
female_rate,14%
followers_count,1138
followers_list,"{u'Jodoniwi': {u'lang': u'es', u'favourites_co..."
friends_count,177
friends_list,"{u'rgransberger': {u'lang': u'de', u'favourite..."
gender,undetermined
id,3899375963
lang,es


In [159]:
friends_total = whoisj['jotb17']['friends_count']
friends_total

177

In [160]:
followers_total = whoisj['jotb17']['followers_count']
followers_total

1138

In [161]:
people = pd.read_json(whoisj['jotb17'].to_json())
people

Unnamed: 0,favourites_count,female_count,female_rate,followers_count,followers_list,friends_count,friends_list,gender,id,lang,location,male_count,male_rate,name,nonbinary_count,nonbinary_rate,statuses_count,total_count,undefined_count,undefined_rate
1968damasco,1272,175,14%,1138,"{u'lang': u'es', u'favourites_count': 1, u'nam...",177,,undetermined,3899375963,es,"Málaga, España",546,44%,J On The Beach,5,0%,1228,1233,507,41%
47deg,1272,175,14%,1138,"{u'lang': u'en', u'favourites_count': 2122, u'...",177,"{u'lang': u'en', u'favourites_count': 2122, u'...",undetermined,3899375963,es,"Málaga, España",546,44%,J On The Beach,5,0%,1228,1233,507,41%
4dokoneko,1272,175,14%,1138,"{u'lang': u'ru', u'favourites_count': 268, u'n...",177,,undetermined,3899375963,es,"Málaga, España",546,44%,J On The Beach,5,0%,1228,1233,507,41%
4lberto,1272,175,14%,1138,"{u'lang': u'es', u'favourites_count': 2420, u'...",177,,undetermined,3899375963,es,"Málaga, España",546,44%,J On The Beach,5,0%,1228,1233,507,41%
666Anastazja999,1272,175,14%,1138,"{u'lang': u'pl', u'favourites_count': 574, u'n...",177,,undetermined,3899375963,es,"Málaga, España",546,44%,J On The Beach,5,0%,1228,1233,507,41%
6P592Otv,1272,175,14%,1138,"{u'lang': u'en', u'favourites_count': 4338, u'...",177,,undetermined,3899375963,es,"Málaga, España",546,44%,J On The Beach,5,0%,1228,1233,507,41%
ADRYVERA27,1272,175,14%,1138,"{u'lang': u'es', u'favourites_count': 11, u'na...",177,,undetermined,3899375963,es,"Málaga, España",546,44%,J On The Beach,5,0%,1228,1233,507,41%
AIOM_oficial,1272,175,14%,1138,"{u'lang': u'es', u'favourites_count': 77, u'na...",177,,undetermined,3899375963,es,"Málaga, España",546,44%,J On The Beach,5,0%,1228,1233,507,41%
APA42,1272,175,14%,1138,"{u'lang': u'en', u'favourites_count': 964, u'n...",177,,undetermined,3899375963,es,"Málaga, España",546,44%,J On The Beach,5,0%,1228,1233,507,41%
APC_AssoFrance,1272,175,14%,1138,"{u'lang': u'en', u'favourites_count': 450, u'n...",177,,undetermined,3899375963,es,"Málaga, España",546,44%,J On The Beach,5,0%,1228,1233,507,41%


In [162]:
followers = pd.read_json(people['followers_list'].to_json(), orient = 'index')
followers

Unnamed: 0,favourites_count,followers_count,friends_count,gender,id,lang,location,name,statuses_count
1968damasco,1.0,19.0,302.0,male,7.087637e+17,es,,Safwan Alsamawai,6.0
47deg,2122.0,3208.0,466.0,undetermined,1.870748e+08,en,Seattle | Spain | London,47 Degrees,1502.0
4dokoneko,268.0,669.0,238.0,male,3.158172e+08,ru,Tucson\r,Reid Joyner,487.0
4lberto,2420.0,435.0,485.0,male,2.062931e+08,es,Oblast de Guadalajara. Madrid,Alberto Moratilla,276.0
666Anastazja999,574.0,254.0,1098.0,female,7.149098e+17,pl,"Małopolskie, Polska",Anastazja Lisowska,103.0
6P592Otv,4338.0,1015.0,4730.0,male,2.809616e+09,en,,STJEPAN TOKIC,7497.0
ADRYVERA27,11.0,578.0,2741.0,undetermined,3.903435e+09,es,"Alicante, España",ADRY VERA,196.0
AIOM_oficial,77.0,586.0,2754.0,undetermined,3.638458e+08,es,Málaga,AIOM,249.0
APA42,964.0,401.0,699.0,undetermined,8.012086e+07,en,,APA,7542.0
APC_AssoFrance,450.0,1083.0,4373.0,undetermined,3.187869e+09,en,France,APC FRANCE Asso.,706.0


In [216]:
followers['gender'].value_counts()

male             474
undetermined     454
female           135
mostly_female     31
mostly_male       29
nonbinary          4
Name: gender, dtype: int64

In [296]:
followers_dist = followers['gender'].value_counts()

followers_map = pygal.Treemap(height=400)
followers_map.title = 'Followers Gender Map'
followers_map.add('Male',followers_dist[0])
followers_map.add('Undetermined',followers_dist[1])
followers_map.add('Female',followers_dist[2])
followers_map.add('Mostly Female',followers_dist[3])
followers_map.add('Mostly Male',followers_dist[4])
followers_map.add('Non binary',followers_dist[5])

followers_map.render_in_browser()

file://c:/users/carme/appdata/local/temp/tmpltyjzc.html


In [252]:
lang_counts = followers['lang'].value_counts()
languages = followers['lang'].value_counts().keys()

followers_dist = followers['gender'].value_counts()

lang_followers_map = pygal.Treemap(height=400)
lang_followers_map.title = 'Followers Language Map'

for i in languages:
    lang_followers_map.add(i,lang_counts[i])

lang_followers_map.render_in_browser()

file://c:/users/carme/appdata/local/temp/tmpx6frvp.html


In [115]:
followers['location'].value_counts()

                                  303
Málaga                             39
Málaga, España                     21
Madrid                             20
Spain                              13
España                             11
Málaga, Andalucía                  10
London, England                     9
Málaga, Spain                       9
London                              9
Malaga                              8
Portugal                            8
Украина                             7
Madrid, Spain                       7
Madrid, Comunidad de Madrid         6
Malaga, Spain                       6
Sevilla                             5
Porto, Portugal                     5
Almería, España                     4
Seattle, WA                         4
San Francisco                       4
New York, USA                       4
Bulgaria                            4
Dublin City, Ireland                4
Barcelona                           4
Stockholm, Sweden                   3
San Francisc

In [107]:
following = pd.read_json(people['friends_list'].to_json(), orient = 'index')
following

Unnamed: 0,favourites_count,followers_count,friends_count,gender,id,lang,location,name,statuses_count
1968damasco,,,,,,,,,
47deg,2122.0,3208.0,466.0,undetermined,1.870748e+08,en,Seattle | Spain | London,47 Degrees,1502.0
4dokoneko,,,,,,,,,
4lberto,,,,,,,,,
666Anastazja999,,,,,,,,,
6P592Otv,,,,,,,,,
ADRYVERA27,,,,,,,,,
AIOM_oficial,,,,,,,,,
APA42,,,,,,,,,
APC_AssoFrance,,,,,,,,,


In [108]:
following['gender'].value_counts()

undetermined     86
male             68
female           13
mostly_female     5
mostly_male       4
nonbinary         1
Name: gender, dtype: int64

In [112]:
following['lang'].value_counts()

en       120
es        37
fr         8
de         5
pt         2
it         2
hu         1
ru         1
en-gb      1
Name: lang, dtype: int64

In [113]:
following['location'].value_counts()

                                        32
London                                   9
San Francisco, CA                        7
Madrid                                   5
Málaga                                   4
Málaga, Spain                            3
Worldwide                                3
Madrid, Comunidad de Madrid              3
Global                                   3
Seattle, WA                              2
Madrid, Spain                            2
London, England                          2
France                                   2
Málaga, Andalucía                        2
Málaga, España                           2
Cambridge, England                       2
London, UK                               2
Germany                                  2
Switzerland                              2
Maringá, Brasil                          1
Ottignies-Louvain-la-Neuve, Belgique     1
Genève, Suisse                           1
In JUGs around the world!                1
Baños del C

### Tweets analysis

In [218]:
run tweets.py yeswetech_ 1200

  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)
  return _unidecode(string)


In [219]:
j_network = pd.read_json('../out/yeswetech__tweets.json', orient = 'index')
j_network

Unnamed: 0,description,gender,hashtags,location,name
Adolfo_fdz,Business Development Manager Spain & Portugal ...,male,[YesWeTech],DUB | MAD | BCN | LIS,Adolfo Fernández
Andreitakas,Hablo por los codos. Docente y consultora de #...,female,[],Málaga,Andrea Castro
AnjanaVakil,"Lover of languages, for humans and machines. P...",female,[],"Berlin, Germany",Anjana @ AlterConf
AnneliseGripp,"Especialista Transformação Ágil,Consultora em ...",female,[],"Sao Paulo, Brasil",Annelise Gripp
AzaharVyb,...,undetermined,[],,Azahar
Beatrizhll,"Love music, dancing & traveling. UX Research &...",female,"[women, sciencefiction]","Madrid, Segovia & the World",Beatriz
BicEuronova,"Apoyamos la creación, incubación y consolidaci...",undetermined,"[mujeres, tecnología, youtech]","PTA, Málaga",BIC Euronova
CCComUMA,Cuenta oficial de la Facultad de Ciencias de l...,undetermined,"[CAV, CambioSocial, feminismo, TheFutureisFemale]","Málaga, España",Fac.ComunicaciónUMA
CrisRojoLu,"All my Tweets are my own view. Love TECH, lear...",male,[],,Cris Rojo
DiarioSUR,"#Noticias de #últimahora de #Málaga, España y ...",undetermined,[],Málaga,Diario SUR


In [220]:
j_network['gender'].value_counts()

undetermined     76
female           34
male             12
mostly_female     7
nonbinary         3
mostly_male       1
Name: gender, dtype: int64

In [317]:
import operator
pairs = []
for i in j_network['gender'].keys() :
    if (j_network['hashtags'][i] != []) : 
        pairs.append([j_network['hashtags'][i], j_network['gender'][i]]) 

key_pairs = []
for i,j in pairs:
    for x in i:
        key_pairs.append((x,j))

key_pairs
key_pair_dist = {x: key_pairs.count(x) for x in key_pairs}
sorted_x = sorted(key_pair_dist.items(), key = operator.itemgetter(1), reverse = True)
sorted_x

[((u'BigData', u'undetermined'), 10),
 ((u'1Abril', u'undetermined'), 10),
 ((u'IoT', u'undetermined'), 7),
 ((u'JOTB17', u'undetermined'), 6),
 ((u'Taller', u'undetermined'), 4),
 ((u'bigdata', u'undetermined'), 4),
 ((u'WomenTechmakers', u'undetermined'), 4),
 ((u'1Abril', u'female'), 4),
 ((u'SEO', u'undetermined'), 4),
 ((u'distributedsystems', u'undetermined'), 3),
 ((u'wtmmalaga2017', u'undetermined'), 3),
 ((u'python', u'undetermined'), 3),
 ((u'data', u'undetermined'), 2),
 ((u'frasesdemadres', u'undetermined'), 2),
 ((u'mujeres', u'undetermined'), 2),
 ((u'womenintech', u'undetermined'), 2),
 ((u'tecnolog\xeda', u'female'), 2),
 ((u'gendergap', u'undetermined'), 2),
 ((u'YWT', u'undetermined'), 2),
 ((u'YesWeTech', u'undetermined'), 2),
 ((u'frasesdemadres', u'female'), 2),
 ((u'Taller', u'female'), 2),
 ((u'uxspain', u'undetermined'), 2),
 ((u'JOTB17', u'male'), 2),
 ((u'JOTB17', u'female'), 1),
 ((u'scala', u'nonbinary'), 1),
 ((u'InteligenciaArtifical', u'undetermined'), 1)

In [293]:
a = j_network['hashtags']
b = j_network['gender']

say_something = [x for x in a if x != []]

tags = []

for y in say_something:
    for x in pd.DataFrame(y)[0]:
        tags.append(x.lower())
        
        
pd.DataFrame(tags)[0].value_counts()

1abril                       14
bigdata                      14
jotb17                        9
iot                           7
taller                        6
seo                           5
womentechmakers               5
tecnología                    4
frasesdemadres                4
yeswetech                     3
wtmmalaga2017                 3
python                        3
distributedsystems            3
womenintech                   3
uxspain                       2
gendergap                     2
data                          2
sonido3d                      2
ywt                           2
opensouthcode                 2
mujeres                       2
microservices                 1
cav                           1
css                           1
charlas                       1
programación                  1
mujeryciencia                 1
yesyoutech                    1
scala                         1
yesshecan                     1
                             ..
desayuno

## Credits

Few lines to credit this work. Thanks M. Carmen Correa to find the time between work and family to collect all these data, coding it in Python and dealing with the Twitter API. Thanks also to Ángela Dini and Gema Sánchez, to keep this project energised and share it with the press and the community. Thanks also to the women who have joined not just once, or twice but many times to Yes We Tech meetups, and for sure thank you for your interest, your support and your time. If I have one credit is just the attempt to organise a space free of the same old-boring-macho thing. Hope you enjoyed it and thank you.

Shared in github https://github.com/YesWeTech/whoIsJ