# WikiBeania - Gathering All of the Knowledge on Beans

Beans are everywhere, no doubt about that. But one comment that often comes up in many bean lectures I have attended is: there are too many beans! I can't keep track of all of my beans! How many beans does it take to get to the center of a tootsie pop? 

To them I say, "you are right." There is no central hub of knowledge for beans. When I google searched "how many beans are there?" Do you know what was returned? Nothing helpful..

Therefore, I am proposing to build a knowledge center for beans, or as we have decided to call it, WikiBeania.

In [32]:
# Lets get some helper beans
import wikipediaapi as wikibeaniaapi
import re as clean_beans

WikiBeania directly plagiarizes and copies results from Wikipedia, a common internet site to understand more about certain topics. But the vastness of Wikipedia can often be overwhelming, especially for large topics, such as Beans.

In [3]:
# lets connect to wikipedia and name it wikibeania
wikibeania = wikibeaniaapi.Wikipedia(
        language='en',
        extract_format=wikibeaniaapi.ExtractFormat.WIKI
)

# run a quick test to take a look at the original bean wikibeania page
try:
    the_original_bean = wikibeania.page('Bean')
    print("Connected to Beans")
except:
    print("Alert! Bean Connection Failed")

Connected to Beans


In [4]:
the_original_bean.exists()

True

__Excellent__ we are connected to a wikibeania bean

In [5]:
# Lets learn more
the_original_bean.summary

'A bean is the seed of one of several genera of the flowering plant family Fabaceae, which are used as vegetables for human or animal food. They can be cooked in many different ways, including boiling, frying, and baking, and are used in several traditional dishes throughout the world.'

__Wow__ what knowledge of beans

Lets dig in a bit deeper and see how many beans are referenced in this wikibeania bean

In [6]:
bean_searcher = clean_beans.compile(r'\bbean\b|\bbeans\b', flags=clean_beans.IGNORECASE)
how_many_beans = bean_searcher.findall(the_original_bean.text)

In [33]:
print("There were {} beans referenced in the og bean".format(len(how_many_beans)))

There were 114 beans referenced in the og bean


__How stunning__

In [37]:
# lets make a quick function to see what a wikibeania bean references
def print_beans(bean_page):
        bean_links = bean_page.links
        for bean_title in sorted(bean_links.keys()):
            print("%s: %s" % (bean_title, bean_links[bean_title]))

In [38]:
print_beans(the_original_bean)

Adzuki bean: Adzuki bean (id: ??, ns: 0)
Afghanistan: Afghanistan (id: ??, ns: 0)
Alfalfa: Alfalfa (id: ??, ns: 0)
Ancient Egypt: Ancient Egypt (id: ??, ns: 0)
Annibale Carracci: Annibale Carracci (id: ??, ns: 0)
Antinutrient: Antinutrient (id: ??, ns: 0)
Appaloosa bean: Appaloosa bean (id: ??, ns: 0)
Arachis: Arachis (id: ??, ns: 0)
Arachis hypogaea: Arachis hypogaea (id: ??, ns: 0)
Bacteria: Bacteria (id: ??, ns: 0)
Bad Bug Book: Bad Bug Book (id: ??, ns: 0)
Bahamas: Bahamas (id: ??, ns: 0)
Baked beans: Baked beans (id: ??, ns: 0)
Bean (disambiguation): Bean (disambiguation) (id: ??, ns: 0)
Beansprout: Beansprout (id: ??, ns: 0)
Bibcode (identifier): Bibcode (identifier) (id: ??, ns: 0)
Black gram: Black gram (id: ??, ns: 0)
Black turtle bean: Black turtle bean (id: ??, ns: 0)
Black-eyed pea: Black-eyed pea (id: ??, ns: 0)
Brazil: Brazil (id: ??, ns: 0)
Broad bean: Broad bean (id: ??, ns: 0)
Broad beans: Broad beans (id: ??, ns: 0)
Cabbage: Cabbage (id: ??, ns: 0)
Cajanus: Cajanus (i

__this mere bean knows a lot of beans__

How many beans though does it know, and how many beans do the beans of the original bean know and how many beans do the beans of the beans of the original bean know? these questions are what Wikibeania sets out to answer

In [10]:
# another helper function to find beans that other beans know..
bean_regex = clean_beans.compile(r'\bbean\b|\bbeans\b', flags=clean_beans.IGNORECASE)

original_bean = wikibeania.page('Bean')
previous_beans = ['Bean']

def recurse_of_the_beans(original_bean, previous_beans):
    for new_bean in original_bean.links:
        bean_locator = bean_regex.findall(new_bean)
        if bean_locator != [] and new_bean not in previous_beans:
            new_bean_wiki = wikibeania.page(new_bean)
            previous_beans += [new_bean]
            print('OOOEEE Another Bean: ', new_bean)
            recurse_of_the_beans(new_bean_wiki, previous_beans)

In [11]:
recurse_of_the_beans(original_bean, previous_beans)

OOOEEE Another Bean:  Adzuki bean
OOOEEE Another Bean:  Bean weevil
OOOEEE Another Bean:  Black adzuki bean
OOOEEE Another Bean:  White adzuki bean paste
OOOEEE Another Bean:  Azuki bean
OOOEEE Another Bean:  Kidney bean
OOOEEE Another Bean:  Red beans and rice
OOOEEE Another Bean:  15 bean soup
OOOEEE Another Bean:  Baked beans
OOOEEE Another Bean:  Baked bean
OOOEEE Another Bean:  Baked bean sandwich
OOOEEE Another Bean:  Bean dip
OOOEEE Another Bean:  Bean pie
OOOEEE Another Bean:  Bean salad
OOOEEE Another Bean:  Borracho beans
OOOEEE Another Bean:  Black turtle bean
OOOEEE Another Bean:  Black bean (disambiguation)
OOOEEE Another Bean:  Black Bean Games
OOOEEE Another Bean:  Black bean sauce
OOOEEE Another Bean:  Black bean paste
OOOEEE Another Bean:  Fermented bean curd
OOOEEE Another Bean:  Red bean cake
OOOEEE Another Bean:  Cowboy beans
OOOEEE Another Bean:  Dilly beans
OOOEEE Another Bean:  Green bean casserole
OOOEEE Another Bean:  Green bean
OOOEEE Another Bean:  Common bea

OOOEEE Another Bean:  William Bean (geologist)
OOOEEE Another Bean:  William Bennett Bean
OOOEEE Another Bean:  William Jackson Bean
OOOEEE Another Bean:  George Ewart Bean
OOOEEE Another Bean:  Willie Bean
OOOEEE Another Bean:  Talk:William Bean (disambiguation)
OOOEEE Another Bean:  Talk:Billy Bean (disambiguation)
OOOEEE Another Bean:  Charles Bean
OOOEEE Another Bean:  Charlie Bean (disambiguation)
OOOEEE Another Bean:  Charlie Bean (economist)
OOOEEE Another Bean:  Charlie Bean (filmmaker)
OOOEEE Another Bean:  Talk:Charlie Bean
OOOEEE Another Bean:  Colin Bean
OOOEEE Another Bean:  Colter Bean
OOOEEE Another Bean:  Talk:Colter Bean
OOOEEE Another Bean:  Frances Bean Cobain
OOOEEE Another Bean:  Freebie and The Bean
OOOEEE Another Bean:  Freebie and the Bean (TV series)
OOOEEE Another Bean:  Freebie and the Bean
OOOEEE Another Bean:  George Bean
OOOEEE Another Bean:  Hannibal Roy Bean
OOOEEE Another Bean:  Hugh Bean
OOOEEE Another Bean:  Jake Bean
OOOEEE Another Bean:  James Bean


## HOLY BEANS!!!
Thats a lot of beans

A Dragon tongue bean, Red bean pancake, a bean film, Beans the rapper, A Battle of Bean's station.

Beans continue to amaze wikibeania. And its clear that beans are often referencing other beans. What a great beanmmunity.

Lets look at how many beans, each one of these beans is talking about.

In [12]:
number_of_bean_references = 0
number_of_bean_pages = 0
bean_text = []
for bean in previous_beans:
    new_bean = wikibeania.page(bean)
    number_of_bean_pages += 1
    bean_temp_text = new_bean.text
    bean_text.append(bean_temp_text)
    num_beans_temp = len(bean_regex.findall(bean_temp_text))
    number_of_bean_references += num_beans_temp
    print("OOOEEE check out this bean: ", bean)
    print("Look at all the beans in this bean: ", num_beans_temp)

OOOEEE check out this bean:  Bean
Look at all the beans in this bean:  114
OOOEEE check out this bean:  Adzuki bean
Look at all the beans in this bean:  60
OOOEEE check out this bean:  Bean weevil
Look at all the beans in this bean:  3
OOOEEE check out this bean:  Black adzuki bean
Look at all the beans in this bean:  13
OOOEEE check out this bean:  White adzuki bean paste
Look at all the beans in this bean:  64
OOOEEE check out this bean:  Azuki bean
Look at all the beans in this bean:  60
OOOEEE check out this bean:  Kidney bean
Look at all the beans in this bean:  32
OOOEEE check out this bean:  Red beans and rice
Look at all the beans in this bean:  19
OOOEEE check out this bean:  15 bean soup
Look at all the beans in this bean:  19
OOOEEE check out this bean:  Baked beans
Look at all the beans in this bean:  120
OOOEEE check out this bean:  Baked bean
Look at all the beans in this bean:  120
OOOEEE check out this bean:  Baked bean sandwich
Look at all the beans in this bean:  7
OO

OOOEEE check out this bean:  Flageolet bean
Look at all the beans in this bean:  5
OOOEEE check out this bean:  Green beans
Look at all the beans in this bean:  50
OOOEEE check out this bean:  Rattlesnake bean
Look at all the beans in this bean:  5
OOOEEE check out this bean:  String Beans (film)
Look at all the beans in this bean:  2
OOOEEE check out this bean:  String beans
Look at all the beans in this bean:  50
OOOEEE check out this bean:  List of diseases of the common bean
Look at all the beans in this bean:  4
OOOEEE check out this bean:  Bean golden mosaic virus
Look at all the beans in this bean:  0
OOOEEE check out this bean:  Organic beans
Look at all the beans in this bean:  55
OOOEEE check out this bean:  Pea bean
Look at all the beans in this bean:  14
OOOEEE check out this bean:  Beans
Look at all the beans in this bean:  114
OOOEEE check out this bean:  Bean (disambiguation)
Look at all the beans in this bean:  27
OOOEEE check out this bean:  BEAN (charity)
Look at all 

OOOEEE check out this bean:  William Bean (disambiguation)
Look at all the beans in this bean:  13
OOOEEE check out this bean:  Bean Brothers
Look at all the beans in this bean:  73
OOOEEE check out this bean:  Joe Bean
Look at all the beans in this bean:  5
OOOEEE check out this bean:  William Bean
Look at all the beans in this bean:  10
OOOEEE check out this bean:  Bean Station, Tennessee
Look at all the beans in this bean:  27
OOOEEE check out this bean:  1972 Bean Station, Tennessee bus crash
Look at all the beans in this bean:  2
OOOEEE check out this bean:  Battle of Bean's Station
Look at all the beans in this bean:  16
OOOEEE check out this bean:  Bean's Station
Look at all the beans in this bean:  27
OOOEEE check out this bean:  William Bean (geologist)
Look at all the beans in this bean:  4
OOOEEE check out this bean:  William Bennett Bean
Look at all the beans in this bean:  9
OOOEEE check out this bean:  William Jackson Bean
Look at all the beans in this bean:  8
OOOEEE che

OOOEEE check out this bean:  Cocoa bean
Look at all the beans in this bean:  41
OOOEEE check out this bean:  Coffee bean
Look at all the beans in this bean:  64
OOOEEE check out this bean:  The Coffee Bean & Tea Leaf
Look at all the beans in this bean:  27
OOOEEE check out this bean:  List of bean-to-bar chocolate manufacturers
Look at all the beans in this bean:  5
OOOEEE check out this bean:  Omanhene Cocoa Bean Company
Look at all the beans in this bean:  5
OOOEEE check out this bean:  Talk:Omanhene Cocoa Bean Company
Look at all the beans in this bean:  0
OOOEEE check out this bean:  Nacional (cocoa bean)
Look at all the beans in this bean:  5
OOOEEE check out this bean:  Talk:Cocoa bean
Look at all the beans in this bean:  21
OOOEEE check out this bean:  Cacao bean
Look at all the beans in this bean:  41
OOOEEE check out this bean:  Talk:Cocoa bean/Archive 1
Look at all the beans in this bean:  0
OOOEEE check out this bean:  Talk:Cocoa bean/Archive A
Look at all the beans in this 

In [13]:
print("Total Beans Searched: {}\nTotal Beans Referenced: {}".format(number_of_bean_pages,number_of_bean_references))

Total Beans Searched: 321
Total Beans Referenced: 7401


__look at all of that bean knowledge__

Now, lets dig deeper, what are the main topic of these many beans. Is it legumes? Cooking? The Magical Fruit? 

Wikibeania is here to help solve these questions

In [14]:
# more helper beans
from sklearn.feature_extraction.text import CountVectorizer as BeanCountVectorizer
from nltk.stem import SnowballStemmer as BeanStemmer
import nltk as beanltk
import numpy as nbean
from nltk.probability import FreqDist as BeanFreqDist
stop_beans = set(beanltk.corpus.stopwords.words('english'))
#bean_stemmer = BeanStemmer(language='english')

In [15]:
# a little helper bean to help clean up the bean knowledge
def clean_my_beans(bean_text, stop_beans):
    bean_text = bean_text.lower() 
    bean_text = clean_beans.sub(r'[^\w\s\d]',r' ', bean_text) 
    bean_text = clean_beans.sub(r'\s+',r' ', bean_text)
    bean_text = clean_beans.sub(r'\s([^ai\d]{1}\s)',r'\1', bean_text)
    bean_text = clean_beans.sub(' [^ai]{1} ', ' ', bean_text)
    bean_text = beanltk.word_tokenize(bean_text)
    new_bean_text = []
    for bean in bean_text:
        if bean not in stop_beans:
            new_bean_text.append(bean)
    return ' '.join(new_bean_text)

In [16]:
clean_bean_knowledge = [clean_my_beans(bean_knowledge, stop_beans) for bean_knowledge in bean_text]

In [17]:
clean_bean_knowledge[0][:100]

'bean seed one several genera flowering plant family fabaceae used vegetables human animal food cooke'

__Ah clean beans are my favorite__

Now lets take a look at what these beans are talking about

In [18]:
bean_dist = BeanFreqDist(' '.join(clean_bean_knowledge).split(' '))
the_bean_talk = bean_dist.most_common(50)

In [19]:
the_bean_talk

[('bean', 4440),
 ('beans', 2932),
 ('also', 976),
 ('used', 664),
 ('mr', 657),
 ('one', 514),
 ('first', 480),
 ('green', 428),
 ('tofu', 399),
 ('cocoa', 393),
 ('red', 390),
 ('series', 389),
 ('may', 384),
 ('soy', 381),
 ('two', 377),
 ('called', 371),
 ('made', 368),
 ('new', 362),
 ('common', 344),
 ('film', 339),
 ('many', 337),
 ('known', 319),
 ('time', 293),
 ('seeds', 287),
 ('united', 286),
 ('cooked', 285),
 ('black', 282),
 ('soybeans', 277),
 ('protein', 276),
 ('american', 270),
 ('soybean', 266),
 ('production', 264),
 ('often', 263),
 ('episode', 262),
 ('dish', 256),
 ('white', 255),
 ('references', 244),
 ('plant', 241),
 ('food', 240),
 ('broad', 234),
 ('long', 232),
 ('paste', 230),
 ('world', 226),
 ('states', 225),
 ('rice', 225),
 ('however', 218),
 ('dry', 217),
 ('like', 216),
 ('varieties', 214),
 ('cooking', 213)]

__OOOEEE Beans__ look at what these beans are talking about. Lots of references to beans, who woulda thought! Oh protein too, oooeeee beans, "mr" must be referring to mr bean oh boy never seen that movie but i heard its beanin.

Looking at one word frequency beantributions doesnt do a justice for the vast knowledge of beans, so lets take a look a some phrases to really see what these beans are talking about

In [20]:
bean_vectorizer = BeanCountVectorizer(ngram_range=(2,10)) # phrases between two and ten words
bean_frequencies = bean_vectorizer.fit_transform(clean_bean_knowledge)
bean_phrase_freqs = nbean.ravel(bean_frequencies.sum(axis=0))

In [21]:
bean_phrases = [k for k, v in sorted(bean_vectorizer.vocabulary_.items(), key=lambda bean_item: bean_item[1])]

In [28]:
import pandas as beandas
bean_phrase_df = beandas.DataFrame(bean_phrases, columns = ['bean_phrases']).\
                    join(beandas.DataFrame(bean_phrase_freqs, columns = ['bean_phrase_freqs']))

In [29]:
bean_phrase_df.sort_values(by = ['bean_phrase_freqs'], ascending = False, inplace = True)
bean_phrase_df.reset_index(drop = True, inplace = True)

In [31]:
bean_phrase_df.iloc[:50]

Unnamed: 0,bean_phrases,bean_phrase_freqs
0,mr bean,514
1,united states,207
2,external links,185
3,broad beans,154
4,green beans,142
5,baked beans,141
6,bean paste,138
7,references external,133
8,references external links,133
9,red bean,128


 __oh my beans, beans are for sure talking about beans!!__ What is bean phaseolus, or bean soup dont mind if I do! OOOEEE rowan atkinson must be the mr. bean. he knows what we are talking about. The united states and united kingdom must like their beans. 

## That's enough beans for now

Anyways, Wikibeania is proud to support future bean knowledge initiatives. Please feel free to reach out anytime with questions or to gain access to the Wikibeania foundation. Also please send us some bean donations and help set Bean knowledge free for all! 