# PyLadies and local Python User Groups

_Last updated: August 4, 2015_

I am not a statistician by trade; far from it.  I did take a few stats & econometrics courses in college, but I won't even consider myself an armchair statistician here. 

I am not making any suggestions about causation, just merely exploring what the [Meetup API][0] has to offer.

This also isn't how I code in general; but I love ~~IPython~~ [Jupyter Notebooks][1], and I wanted an excuse to use it with Pandas (first time I'm using [Pandas][2] too!).

---

This data was used in my EuroPython 2015 talk, [Diversity: We're not done yet][3]. ([Slides][4], video soon)

[0]: http://www.meetup.com/meetup_api/
[1]: https://jupyter.org/
[2]: http://pandas.pydata.org/
[3]: http://www.roguelynn.com/words/were-not-done-yet/
[4]: https://speakerdeck.com/roguelynn/diversity-were-not-done-yet

In [98]:
from __future__ import print_function
from collections import defaultdict
import json
import os
import time

import requests

### Part 1: Grabbing all Python-centric meetup groups

#### NOTE 
This repository includes all the data files that I used (latest update: Aug 4, 2015).  You may skip this part if you don't want to call the Meetup API to get new/fresh data. 

---

#### TIP
Take a look at Meetup's [API Console][0]; I used it when forming API requests as well as getting an idea of pagination for some requests.

---

#### What we're doing
We'll call a few different endpoints from the Meetup API and save the data locally in a `json` file for us to use later.

To get your own Meetup API key, you'll need a regular Meetup user account.  Once you're logged in, you can navigate to the [API Key][1] portion of the API docs to reveal your API key.

API Endpoint docs:

* [Groups][2]

[0]: https://secure.meetup.com/meetup_api/console/
[1]: https://secure.meetup.com/meetup_api/key/
[2]: http://www.meetup.com/meetup_api/docs/2/groups/

In [15]:
def save_output(data, output_file):
    with open(output_file, "w") as f:
        json.dump(data, f)

In [6]:
# Set some global variables
MEETUP_API_KEY = "yeah right"
MEETUP_GROUPS_URL = "https://api.meetup.com/2/groups"
PARAMS = {
    "signed": True,
    "key": MEETUP_API_KEY,
    "topic": "python",
    "category_id": 34,  # 34 = Tech, there are only ~35 categories
    "order": "members",
    "page": 200, # max allowed
    "omit": "group_photo"  # no need for photos in response
}
TOTAL_PAGES = 6  # looked on the API console, 1117 meetup groups as of 7/17, 200 groups per page = 6 pages

The Meetup API [limits requests][0], however their documentation isn't exactly helpful.  Using their headers, I saw that I was limited to 30 requests per 10 seconds.  Therefore, I'll sleep 1 second in between each request to be safe.

[0]: http://www.meetup.com/meetup_api/docs/#limits

In [16]:
def get_meetup_groups():
    meetup_groups = []

    for i in xrange(TOTAL_PAGES):
        PARAMS["offset"] = i
        print("GROUPS: Getting page {0} of {1}".format(i+1, TOTAL_PAGES+1))
        response = requests.get(MEETUP_GROUPS_URL, params=PARAMS)
        if response.ok:
            meetup_groups.extend(response.json().get("results"))
        time.sleep(1)  # don't bombard the Meetup API
    print("GROUPS: Collected {0} Meetup groups".format(len(meetup_groups)))
    return meetup_groups

In [17]:
meetup_groups = get_meetup_groups()

GROUPS: Getting page 1 of 7
GROUPS: Getting page 2 of 7
GROUPS: Getting page 3 of 7
GROUPS: Getting page 4 of 7
GROUPS: Getting page 5 of 7
GROUPS: Getting page 6 of 7
GROUPS: Collected 1135 Meetup groups


In [20]:
# Create a directory to save everything
data_dir = "meetup_data"
if not os.path.exists(data_dir):
    os.makedirs(data_dir)

# Save meetup groups data
output = os.path.join(data_dir, "meetup_groups.json")
save_output(meetup_groups, output)

In [21]:
# inspect one for funsies
meetup_groups[0]

{u'category': {u'id': 34, u'name': u'tech', u'shortname': u'tech'},
 u'city': u'Mountain View',
 u'country': u'US',
 u'created': 1206761791000,
 u'description': u'<p><a href="http://hf.cx/">You should really go over there and learn more about us</a>.</p>\n<p>Hackers / Founders is the largest community of early tech founders in Silicon Valley ( that includes SF ).</p>\n<p><a href="http://hf.cx/coop/">We have a founder\'s co-op accelerator. &nbsp;Learn more here</a></p>\n<p><span>Like reviewing startups? <a href="https://docs.google.com/a/hf.cx/forms/d/1ImzZHqHI1w8azBhTBeXBUn8lLzzdxTx0eq2FAnSdO-8/viewform">Sign up here!</a></span></p>\n<p><span>We host networking events and meetups in </span>Silicon Valley<span>, San Francisco, Berkeley and San Jose.</span></p>',
 u'id': 1084744,
 u'join_mode': u'open',
 u'lat': 37.40999984741211,
 u'link': u'http://www.meetup.com/Hackers-and-Founders/',
 u'lon': -122.08000183105469,
 u'members': 12374,
 u'name': u'Hackers and Founders',
 u'organizer': {

## Part 2: Narrow down & sort the meetup groups

We got a lot returned from searching the `/groups` endpoint with just the "python" topic.  So we should narrow it down a bit, as well as sort out PyLadies groups.

My process is to just narrow down by actual name of the group (e.g. `python`, `py`, `django`, etc).  

Spot checking the results will definitely be needed, but will come a bit later.

In [44]:
search = ["python", "pydata", "pyramid", "py", "django", "flask", "plone"]
omit = ["happy"]  # I realize that a group could be called "happy python user group" or something...

def is_pug(group):
    """
    Return `True` if in `search` key words and not in `omit` keywords.
    """
    group_name = group.get("name").lower()
    for o in omit:
        if o in group_name:
            return False
    for s in search:
        if s in group_name:
            return True
    return False
    
    
def sort_groups(groups):
    """
    Sort groups by 'pyladies' and 'python user groups'.
    """
    pyladies = []
    user_groups = []
    for g in groups:
        if "pyladies" in g.get("name").lower():
            pyladies.append(g)
        else:
            if is_pug(g):
                user_groups.append(g)
    return user_groups, pyladies


In [42]:
user_groups, pyladies = sort_groups(meetup_groups)

In [47]:
# Let's spot check the UGs to see if what we're left with makes sense
# Note: I took a peek at a few (not shown here) and for the most part, 
#       all seems okay
for g in user_groups:
    print(g.get("name"))

The New York Python Meetup Group
San Francisco Python Meetup Group
The Boston Python User Group
Silicon Valley Python Meetup
DC Python
BangPypers - Bangalore Python Users Group
The Austin Python Meetup
The San Francisco Django Meetup Group
Taipei.py - Taipei Python User Group
Django-NYC
PyAtl: Atlanta Python Programmers
PyData London Meetup
Django Boston Meetup Group
Portland Python User Group
The London Python Group - TLPG
PyData NYC
Python Ireland
The Philadelphia Python Users Group (PhillyPUG)
San Francisco PyData
SoCal Python
Bangalore Django User Group
Bay Area Python Interest Group (BayPIGgies)
Stockholm Python User Group
Sydney Python (SyPy)
The Barcelona Python Meetup Group
NOVA-Python
San Diego Python Users Group
Learn Python NYC
Puget Sound Programming Python (PuPPy)
Python Toronto
PyData Boston
Paris.py (Python, Django & friends)
Python Data Science - Seattle - Bellevue
PythonPune
Austin Web Python
Python Users Berlin (PUB)
Hyderabad Python Meetup Group
Amsterdam Python Meet

## Part 3: Find all Python meetup groups with a PyLadies within 50 miles

I've adapted this from a [Java implementation][0] to find if a point is within a radius of another point.  Geo-math is hard.

[0]: http://stackoverflow.com/questions/120283/how-can-i-measure-distance-and-create-a-bounding-box-based-on-two-latitudelongi/123305#123305

In [48]:
from math import sin, cos, asin, degrees, radians, atan2, sqrt

In [49]:
RADIUS = 3958.75  # Earth's radius in miles

In [50]:
def is_within_50_miles(pyladies_coords, python_coords):
    pyladies_lat, pyladies_lon = pyladies_coords[0], pyladies_coords[1]
    python_lat, python_lon = python_coords[0], python_coords[1]
    d_lat = radians(pyladies_lat - python_lat)
    d_lon = radians(pyladies_lon - python_lon)
    sin_d_lat = sin(d_lat / 2)
    sin_d_lon = sin(d_lon / 2)
    a = (sin_d_lat ** 2 + sin_d_lon ** 2 ) * cos(radians(pyladies_lat)) * cos(radians(python_lat))
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    dist = RADIUS * c
    
    return dist <= 50

In [53]:
def get_coords(group):
    return group.get("lat"), group.get("lon")

def get_nearby_python_groups(pyl, collect):
    pyl_coords = get_coords(pyl)
    
    nearby = []
    for group in user_groups:
        pyt_coords = get_coords(group)
        if is_within_50_miles(pyl_coords, pyt_coords):
            nearby.append(group)
    
    collect[pyl.get("name")] = nearby
    return collect

In [54]:
collect = {}
for pylady in pyladies:
    collect = get_nearby_python_groups(pylady, collect)

In [57]:
for item in collect.items():
    print(item[0], len(item[1]))

PyLadiesCZ 2
PyLadies HTX 2
PyLadies Moscow 1
Pyladies India 3
Ann Arbor PyLadies 3
DC PyLadies 4
Inland Empire Pyladies 5
PyLadies RDU 1
PyLadies Amsterdam 4
NYC PyLadies 15
PyLadies ATX 5
PyLadies Wellington 1
PyLadies Boston 9
PyLadies - Twin Cities 3
PyLadies Istanbul 5
PyLadies Vancouver 2
Helsinki PyLadies 3
PyLadies Edinburgh 1
PyLadies Vienna 2
PyLadies London 8
PyLadiesATL 1
PyLadies Munich 2
Hinterland PyLadies 0
Salt Lake PyLadies 5
PyLadies San Diego 1
PyLadies Perú 1
PyLadies Berlin 5
PyLadies of San Francisco 13
PyLadies BCN 2
PyLadies Taiwan 3
PyLadies Dublin 1
Chicago PyLadies 5
PyLadies Pune 3
Seoul PyLadies Meetup 1
PyLadies PDX 1
PyLadies Paris 4
SA PyLadies Meetup 2
Seattle PyLadies 5
PyLadies Montréal 1
PyLadies Manila 1
PyLadies Milano 1
Pyladies.LosAngeles 8
PyLadies Toronto 2
PyLadiesStockholm 2
DFW PyLadies 1


In [82]:
# Save data into pyladies-specific directories
def pylady_dir(pyl):
    _dir = pyl.split()
    _dir = "".join(_dir)
    outdir = os.path.join(data_dir, _dir)
    if not os.path.exists(outdir):
        os.makedirs(outdir)
    return _dir

def save_pyladies():
    for pylady in pyladies:
        name = pylady.get("name")
        subdir = pylady_dir(name)
        outputdir = os.path.join(data_dir, subdir)
        output = os.path.join(outputdir, subdir + ".json")
        save_output(pylady, output)

        groups = collect.get(name)
        for g in groups:
            group_link = g.get("link")
            group_name = group_link.split(".com/")[1][:-1]
            group_name = "".join(group_name)
            outfile = group_name + ".json"
            ug_output = os.path.join(outputdir, outfile)
            save_output(g, ug_output)

In [83]:
save_pyladies()

Sanity check (I have a `tree` command installed via `brew install tree`):

In [89]:
!tree

.
├── Meetup Stats.ipynb
├── Meetup Topics.ipynb
└── meetup_data
    ├── AnnArborPyLadies
    │   ├── AnnArborPyLadies.json
    │   ├── Detroit-Python-User-Group.json
    │   ├── Michigan-Python-Development-Group.json
    │   └── motorcitydjango.json
    ├── ChicagoPyLadies
    │   ├── ChicagoPyLadies.json
    │   ├── ChicagoPythonistas.json
    │   ├── Fox-Valley-Python.json
    │   ├── PyData-Chicago.json
    │   ├── _ChiPy_.json
    │   └── friendlydjango.json
    ├── DCPyLadies
    │   ├── DCPyLadies.json
    │   ├── DCPython.json
    │   ├── NOVA-Python.json
    │   ├── baltimore-python.json
    │   └── django-district.json
    ├── DFWPyLadies
    │   ├── DFWPyLadies.json
    │   └── dfwpython.json
    ├── HelsinkiPyLadies
    │   ├── HelPy-meetups.json
    │   ├── Helsinki-Python-Workshops.json
    │   ├── HelsinkiPyLadies.json
    │   └── Python-Tallinn.json
    ├── HinterlandPyLadies
    │   └── HinterlandPyLadies.json
    ├── InlandEmpirePyladie

## Part 4: Membership join history

#### Note

If getting members from an endpoint returns 0, despite the member count in the group data being a positive number, then the group is set to private & accessible only to members (you can join that group to be able to have access that data, but I did not; I already have too much email).

#### Note

There's a "pseudo" race condition where the group data member # may be one number, but you actually receive a different number (+/- ~3), it's (probably) due to people leaving or joining the group between the group API call and the members API call.

API endpoint docs:

* [Members][0]

[0]: http://www.meetup.com/meetup_api/docs/2/members/

In [114]:
MEETUP_MEMBER_URL = "https://api.meetup.com/2/members"
PARAMS = {
    "signed": True,
    "key": MEETUP_API_KEY,
}

In [115]:
def get_members(group):
    PARAMS["group_id"] = group.get("id")
    members_count = group.get("members")
    print(u"MEMBERS: Getting {0} members for group {1}".format(members_count, group.get("name")))
    pages = members_count / 200
    remainder = members_count % 200
    if remainder > 0:
        pages += 1
    
    members = []
    for i in xrange(pages):
        print("MEMBERS: Iteration {0} out of {1}".format(i+1, pages+1))
        PARAMS["offset"] = i
        resp = requests.get(MEETUP_MEMBER_URL, PARAMS)
        if resp.ok:
            results = resp.json().get("results")
            members.extend(results)
        time.sleep(1)
    print("MEMBERS: Got {0} members".format(len(members)))
    return members

In [118]:
def get_members_collection(pylady, groups):
    pylady_members = get_members(pylady)
    pug_members = defaultdict(list)
    for g in groups:
        pg_mbrs = get_members(g)
        pug_members[g.get("name")].append(pg_mbrs)
    return pylady_members, pug_members

In [120]:
# NOTE: this takes *FOREVER*.  
start = time.time()
for i, item in enumerate(collect.items()):
    print("COLLECTING: {0} out of {1}".format(i+1, len(collect)+1))
    pylady = [p for p in pyladies if p.get("name") == item[0]][0]
    pylady_members, pug_members = get_members_collection(pylady, item[1])
    
    print("COLLECTING: Saving all the data!")
    pylady_name = pylady.get("name")
    outdir = pylady_dir(pylady_name)
    outdir = os.path.join(data_dir, outdir)
    outfile = os.path.join(outdir, "pyladies_members.json")
    save_output(pylady_members, outfile)
    outfile = os.path.join(outdir, "pug_members.json")
    save_output(pug_members, outfile)
end = time.time()
delta_s = end - start
delta_m = delta_s / 60
print("**DONE**")
print("Completed in {:.0f} minutes".format(delta_m))

COLLECTING: 1 out of 46
MEMBERS: Getting 20 members for group PyLadiesCZ
MEMBERS: Iteration 1 out of 2
MEMBERS: Got 20 members
MEMBERS: Getting 305 members for group Python User Group Austria
MEMBERS: Iteration 1 out of 3
MEMBERS: Iteration 2 out of 3
MEMBERS: Got 305 members
MEMBERS: Getting 144 members for group Django Friends – Vienna
MEMBERS: Iteration 1 out of 2
MEMBERS: Got 144 members
COLLECTING: Saving all the data!
COLLECTING: 2 out of 46
MEMBERS: Getting 154 members for group PyLadies HTX
MEMBERS: Iteration 1 out of 2
MEMBERS: Got 154 members
MEMBERS: Getting 829 members for group PyHou - Houston Python Enthusiasts!
MEMBERS: Iteration 1 out of 6
MEMBERS: Iteration 2 out of 6
MEMBERS: Iteration 3 out of 6
MEMBERS: Iteration 4 out of 6
MEMBERS: Iteration 5 out of 6
MEMBERS: Got 829 members
MEMBERS: Getting 319 members for group PyWeb Houston
MEMBERS: Iteration 1 out of 3
MEMBERS: Iteration 2 out of 3
MEMBERS: Got 319 members
COLLECTING: Saving all the data!
COLLECTING: 3 out of

## Part 5: Graphing

Take a look at `Creating Graphs with Pandas and matplotlib.ipynb` for how to visualize this data with Pandas (not sure why I broke it up into two notebooks).