#  Consignes

## Description

Ouvrir le fichier ks-projects-201801.csv, il recense environ 100 000 projets KickStarter. Intégrer les données directement avec L'API Python dans une base de données Mongo. 

Il conviendra de bien spécifier manuellement l'ID du document. Pensez aussi à bien formatter le type des données pour profiter des méthodes implémentées par Mongo. L'ensemble de données n'est pas forcément nécessaire, c'est à vous de créer votre modèle de données.

## Questions

- 1) Récupérer les 5 projets ayant reçu le plus de promesse de dons.
- 2) Compter le nombre de projets ayant atteint leur but.
- 3) Compter le nombre de projets pour chaque catégorie.
- 4) Compter le nombre de projets français ayant été instanciés avant 2016.
- 5) Récupérer les projets américains ayant demandé plus de 200 000 dollars.
- 6) Compter le nombre de projet ayant "Sport" dans leur nom

In [109]:
import pandas as pd
import pymongo

In [110]:
client = pymongo.MongoClient('mongo:27017')
database = client['exercices']
collection = database['kickstarter']

In [111]:
database.command('ping')

{'ok': 1.0}

In [112]:
df_ks = pd.read_csv("./data/ks-projects-201801-sample.csv")
df_ks.head()

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real
0,872782264,"Scott Cooper's Solo CD ""A Leg Trick"" (Canceled)",Rock,Music,USD,2011-09-16,2000,2011-08-17 06:31:31,1145,canceled,24,US,1145.0,1145.0
1,1326492673,Ohceola jewelry,Fashion,Fashion,USD,2012-08-22,18000,2012-07-23 20:46:48,1851,failed,28,US,1851.0,1851.0
2,1688410639,Sluff Off & Harald: Two latest EGGs are Classi...,Tabletop Games,Games,USD,2016-07-19,2000,2016-07-01 21:55:54,7534,successful,254,US,3796.0,7534.0
3,156812982,SketchPlanner: Create and Plan- all in one bea...,Art Books,Publishing,USD,2017-09-27,13000,2017-08-28 15:47:02,16298,successful,367,US,2670.0,16298.0
4,1835968190,Proven sales with custom motorcycle accessories,Sculpture,Art,CAD,2016-02-24,5000,2016-01-25 17:37:10,1,failed,1,CA,0.708148,0.738225


Ce warning intervient lorsque pandas n'arrive pas à inférer le type de données. Il est sympa il précise les colones 6,8,10,12. 

In [113]:
df_ks.columns[[6,8,10,12]]

Index(['goal', 'pledged', 'backers', 'usd pledged'], dtype='object')

## Question 0

In [114]:
df_ks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150000 entries, 0 to 149999
Data columns (total 14 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   ID                150000 non-null  int64  
 1   name              149998 non-null  object 
 2   category          150000 non-null  object 
 3   main_category     150000 non-null  object 
 4   currency          150000 non-null  object 
 5   deadline          150000 non-null  object 
 6   goal              150000 non-null  object 
 7   launched          150000 non-null  object 
 8   pledged           150000 non-null  object 
 9   state             150000 non-null  object 
 10  backers           150000 non-null  object 
 11  country           150000 non-null  object 
 12  usd pledged       148518 non-null  object 
 13  usd_pledged_real  150000 non-null  float64
dtypes: float64(1), int64(1), object(12)
memory usage: 16.0+ MB


### Netoyer les données

In [120]:
df_ks.isna().sum()

ID                  0
name                0
category            0
main_category       0
currency            0
deadline            1
goal                1
launched            1
pledged             1
state               0
backers             1
country             0
usd pledged         1
usd_pledged_real    0
dtype: int64

In [126]:
import numpy as np

In [127]:
df_ks['deadline'] = pd.to_datetime(df_ks['deadline'],errors='coerce')
df_ks['launched'] = pd.to_datetime(df_ks['launched'],errors='coerce')
df_ks['goal'] = pd.to_numeric(df_ks['goal'],errors = 'coerce')
df_ks['pledged'] = pd.to_numeric(df_ks['pledged'], errors = 'coerce')
df_ks['backers'] = pd.to_numeric(df_ks['backers'],errors = 'coerce')
df_ks['usd pledged'] = pd.to_numeric(df_ks['usd pledged'],errors = 'coerce')

In [128]:
df_ks.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 148516 entries, 0 to 149999
Data columns (total 14 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   ID                148516 non-null  int64         
 1   name              148516 non-null  object        
 2   category          148516 non-null  object        
 3   main_category     148516 non-null  object        
 4   currency          148516 non-null  object        
 5   deadline          148515 non-null  datetime64[ns]
 6   goal              148515 non-null  float64       
 7   launched          148515 non-null  datetime64[ns]
 8   pledged           148515 non-null  float64       
 9   state             148516 non-null  object        
 10  backers           148515 non-null  float64       
 11  country           148516 non-null  object        
 12  usd pledged       148515 non-null  float64       
 13  usd_pledged_real  148516 non-null  float64       
dtypes: d

### Importer les données

In [136]:
collection.insert_many(df_ks.replace({np.nan: None}).to_dict(orient='records'))

<pymongo.results.InsertManyResult at 0x7f7f24dfda80>

In [137]:
collection.count()

  collection.count()


148516

In [138]:
collection.find_one()

{'_id': ObjectId('61af600bc761ef04a322f614'),
 'ID': 872782264,
 'name': 'Scott Cooper\'s Solo CD "A Leg Trick" (Canceled)',
 'category': 'Rock',
 'main_category': 'Music',
 'currency': 'USD',
 'deadline': datetime.datetime(2011, 9, 16, 0, 0),
 'goal': 2000.0,
 'launched': datetime.datetime(2011, 8, 17, 6, 31, 31),
 'pledged': 1145.0,
 'state': 'canceled',
 'backers': 24.0,
 'country': 'US',
 'usd pledged': 1145.0,
 'usd_pledged_real': 1145.0}

## Question 1  

In [139]:
cursor = collection.find().limit(5).sort([("pledged", -1)])

for document in cursor :
    print('-----')
    print(document["name"],": ",document["pledged"])

-----
COOLEST COOLER: 21st Century Cooler that's Actually Cooler :  13285226.36
-----
Pebble 2, Time 2 + All-New Pebble Core :  12779843.49
-----
Expect the Unexpected. digiFilmï¿½ Camera by YASHICA :  10035296.0
-----
OUYA: A New Kind of Video Game Console :  8596474.58
-----
The Everyday Backpack, Tote, and Sling :  6565782.5


## Question 2

In [149]:
cursor = collection.find({'$where': "this.pledged > this.goal"})
len(list(cursor))

51915

## Question 3

In [141]:
cursor = collection.aggregate([{"$group" : {"_id" : "$category", "project_numbers" : {"$sum" : 1}}}])

for document in cursor :
    print('-----')
    print(document["_id"],": ",document["project_numbers"])

-----
Illustration :  1263
-----
Radio & Podcasts :  349
-----
Community Gardens :  115
-----
Accessories :  1203
-----
Punk :  127
-----
Photobooks :  608
-----
Zines :  144
-----
Textiles :  105
-----
Rock :  2707
-----
Video Art :  65
-----
Literary Journals :  118
-----
Chiptune :  10
-----
Gaming Hardware :  178
-----
Puzzles :  95
-----
R&B :  172
-----
Webcomics :  259
-----
Faith :  439
-----
Experimental :  357
-----
Events :  321
-----
Young Adult :  328
-----
Small Batch :  701
-----
Embroidery :  49
-----
Fashion :  3379
-----
Product Design :  8885
-----
Classical Music :  1064
-----
Hardware :  1430
-----
Romance :  74
-----
Apparel :  2827
-----
Drama :  871
-----
3D Printing :  271
-----
People :  440
-----
Music :  5293
-----
Grace is Leaving :  1
-----
Nonfiction :  3390
-----
Woodworking :  433
-----
Art Books :  1065
-----
Music Videos :  299
-----
Playing Cards :  963
-----
Comedy :  923
-----
Residencies :  32
-----
Horror :  525
-----
Science Fiction :  274
-----

## Question 4

In [143]:
import datetime
year2016 = datetime.datetime(2016,1,1)
cursor = collection.find({"$and":[{"country":"FR"}, {"launched":{"$lt":year2016}}]})
len(list(cursor))

330

## Question 5

In [144]:
cursor = collection.find({"$and":[{"country":"US"}, {"goal":{"$gt":200000}}]})

for document in cursor :
    print('-----')
    print(document["name"],": ",document["goal"])

-----
A CALL TO ADVENTURE :  287000.0
-----
Storybricks, the storytelling online RPG :  250000.0
-----
Shine On New World :  300000.0
-----
Nightclub :  3000000.0
-----
Nastaran (Wild Rose) :  250000.0
-----
Hubo - Extension Box for iPhone :  250000.0
-----
Baja ATV Park (Suspended) :  300000.0
-----
Chihuly Installation for Orlando (Pulse Nightclub) :  1000000.0
-----
Kurt Vonnegut: Unstuck in Time :  250000.0
-----
The LAKE HOPPER is a VTOL Flying Water Craft Made in America :  3000000.0
-----
"Hill" (11 For 11) (From the writer of Rudy & Hoosiers) :  1000000.0
-----
Saints of The Classroom :  250000.0
-----
Hemingwrite - A Distraction Free Digital Typewriter :  250000.0
-----
Breakfast 24/7 :  1000000.0
-----
FJE REVOLT :  500000.0
-----
Austin City Limits 40 Year History Documentary Film :  400000.0
-----
Ozark Mountain Ranch :  466000.0
-----
STEM Lesson Plan - BotBrainï¿½ Educational Products :  279646.0
-----
Guitar Godz VR: A 3D Rock Music Game :  500000.0
-----
The world's fir

FLOW: Life Unplugged-High Definition Bluetooth +APTX Adapter :  250000.0
-----
Pollution Fighting, with Lawnmower Lighting! Cooler Mowing. :  250000.0
-----
Nostalgia Arcade :  525000.0
-----
Beautify our Highways :  250000.0
-----
Flynn's Pizza :  250000.0
-----
DANCE, DANCE - Guinness World Record Dance Event :  900000.0
-----
"8 Songs" Documentary - Did music shape human evolution? :  240000.0
-----
Janus World Summit & Festival: recognizing talent worldwide :  265000.0
-----
Vluxe: Fashion Like No Other On Earth. :  300000.0
-----
Me2 :  4000000.0
-----
Sky Bridge: Fallen Stars :  250000.0
-----
Swimming With Guppies :  260000.0
-----
Grown Folk :  498129.55
-----
MaxMyTV: Home Automation and Social Media Overlay on your TV :  250000.0
-----
THE SESSIONS - The Beatles at Abbey Road Studios (Canceled) :  375000.0
-----
Casetop - Every Phone Becomes a Laptop :  300000.0
-----
"Madam Mississippi" :  250000.0
-----
How to use your voice powerfully & everyday, made easy! :  400000.0
---

-----
Mustard Pancakes - Oogleberry's Journey Home :  1400000.0
-----
Byson farm and eatery :  500000.0
-----
The Clambake: The History of the Pebble Beach Pro-Am :  1350000.0
-----
The Manson Obsession (Canceled) :  2000000.0
-----
Dolphinese! Talk to Dolphins in their own native language! :  1171000.0
-----
2d3D HierSwipe :  527000.0
-----
A Hard Place :  350000.0
-----
Monkey Light: The most effective way to be seen on your bike :  220000.0
-----
CourEntertainment LLC's Video Game Project: "Space-opolis" :  2464670.0
-----
Ekawa I - The Motion Picture :  8000000.0
-----
Morons4Money - Set Bob On Fire (Suspended) :  1000000.0
-----
GAMMA Fight Club :  5759042.0
-----
Black On Black Murder in America! Stoping This Madness! :  250000.0
-----
Sit and Be Fit - Keep The Dream Alive (Canceled) :  250000.0
-----
THE AFTERLIFE FILES :  500000.0
-----
Timothy Zahn's Parallax (turn-based 4X galactic conquest) :  350000.0
-----
Save the Victory Theatre (Canceled) :  1000000.0
-----
One Good Dee

-----
www.GuysDoLunch.com :  500000.0
-----
PetWashSPA: The first home SPA system for your pets. :  220000.0
-----
imealhost.com :  250000.0
-----
FOCUS ON THE OCEAN :  350000.0
-----
Ground Branch :  425000.0
-----
Leviathans Online: Monsters In The Sky (Canceled) :  300000.0
-----
William Malone's THALLIUM'S BOX :  350000.0
-----
"PG" Reanimated Cartoons Tha Movie :  100000000.0
-----
SportsBucketList.net :  250000.0
-----
DMG Recording Studio :  250000.0
-----
ProTap: Draft Beer, Wine, and Cold Beverage Dispenser :  350000.0
-----
The Bobcat Exterior Rodent Bait Station :  225000.0
-----
Edesia: Fresh, homemade and delicious Italian food! :  500000.0
-----
Your Way Game Board :  250000.0
-----
A Gangster Movie called,  GET LOUIE :  2000000.0
-----
Stark Drive Electric Bike :  250000.0
-----
Atheism is Winning! (Canceled) :  500000.0
-----
GRANNY, The Movie (starring Jamie Kennedy) (Canceled) :  500000.0
-----
Wining and Dining on the Yacht :  250000.0
-----
The Citizen Body Camera w

-----
Bout Getting Paid Movie Project :  275000.0
-----
MySpotOnTheWorld.com (Canceled) :  5000000.0
-----
Legend in My Living Room (Reality-TV Project) :  3000000.0
-----
MegaBots: Giant Fighting Robots :  1800000.0
-----
FAREWELL TO FREEDOM a modern day western by Anita Waggoner :  1500000.0
-----
Girlz Who Dare 2 Dream :  308124.0
-----
Food Book Fair :  275000.0
-----
The D'mond Project :  2500000.0
-----
Caverns Deep :  233000.0
-----
Kawaii's Perfect American Doll (Canceled) :  250000.0
-----
WARP Wind Amplified Rotor Platforms :  375000.0
-----
Torment: Tides of Numenera :  900000.0
-----
Justyn's :  50000000.0
-----
TKO: The Pugilist's Journey :  750000.0
-----
Expedition Amelia: Finding & Documenting Amelia's Electra :  1960000.0
-----
TOUR BUS FOR JESUS :  1000000.0
-----
SPOOKY - Action / comedy/ horror film here to benefit you :  250000.0
-----
Star Fall Caf?, Borrego Springs, CA :  350000.0
-----
Cthulhu World Combat :  300000.0
-----
Greg Hastings' Tournament Paintball Ma

Save Edith Macefield' "Up House" from Demolition! :  205000.0
-----
CashBurger :  1115000.0
-----
Fallout shelter, Bomb shelter: From design to completion. :  398175.0
-----
Calamity :  1000000.0
-----
New55 FILM :  400000.0
-----
The Next Big Bait TV Show ,Season 1, Bass Edition :  225000.0
-----
Supercharged Hand - The True Beginning of Wearable Power! :  1000000.0
-----
The Dirty Girls Social Club Movie Project (Canceled) :  250000.0
-----
Angelz Landing :  1200000.0
-----
Sugar Bakery :  400000.0
-----
Her Married Lover :  250000.0
-----
Choices & Relations :  750000.0
-----
Inherent :  225000.0
-----
The secret teachings of the black mist :  2000000.0
-----
Let people turn off ads by taking pictures of products. :  2000000.0
-----
The Dogwood Inn, Bringing a new B&B to an Old River Town! :  430000.0
-----
Deathwave Vampire Film  Project. Five Years in the Making! :  525000.0
-----
Capsize of the San Mateo :  1000000.0
-----
No Day'z Later (Canceled) :  300000.0
-----
TEAM SWISH - 

Team NBS Studios :  275000.0
-----
Phone-2-PC :  1500000.0
-----
Chuck Palahniuk's Lullaby :  250000.0
-----
IOM :  250000.0
-----
IMAGINE A WORLD POWERED BY ITSELF! THE FUTURE OF ENERGY:USEI :  5000000.0
-----
FunZilla Family Entertainment Center :  375000.0
-----
Yo City - Personal-user Gaming :  2500000.0
-----
The Truth About The War In Heaven (Declaration of War) in 3D :  21474836.0
-----
EZSHOVELï¿½ - Stand Straighter, Less Bending :  250000.0
-----
Year Round Mountainside Greenhouse in Montana :  450000.0
-----
iri: the Search Engine/Social Network YOU demand :  575000.0
-----
Albany Art Supply Co. :  350000.0
-----
EurekaSpark! :  330000.0
-----
Shake shop- Fast Fruit Restaurant - A fresh idea! :  350000.0
-----
Ketchup & Mustard Slices :  500000.0
-----
Fire Squadron (Canceled) :  300000.0
-----
Island Princess :  250000.0
-----
Christmas Grudge :  300000.0
-----
The MART Channel - The Martial Arts Channel :  1500000.0
-----
Ice Cream Man 2 (Canceled) :  300000.0
-----
Hailan 

Key of Fate :  2900000.0
-----
coreXtreme Afterburner Core Bodyweight Fitness Machine :  1000000.0
-----
Renaissance Men :  1000000.0
-----
Martin Delgado's Flying Techniques :  2000000.0
-----
SANTO 7.13.15 G.M.O. Movie (Canceled) :  2000000.0
-----
Who Am I? Where Do I Come From? The Swiderski Story :  275000.0
-----
Getting down to business :  300000.0
-----
The Return Of The Bell Witch Movie :  100000000.0
-----
Howling Moon Productions Film :  400000.0
-----
"FIRST DOG: SECOND TERM" :  250000.0
-----
ZION (Canceled) :  500000.0
-----
MotoRide quickely on demand with low cost in one click app! :  1200000.0
-----
Unfinished Business:  A Ghost Story :  250000.0
-----
Ride Share :  3000000.0
-----
Shaolin Time Master/Back To Urantia :  237000.0
-----
TREE60 - Christmas Tree Stand "Rotate & Decorate" :  375000.0
-----
RenFest Comedy TV Show (Canceled) :  365000.0
-----
Ube WiFi Connected Smart Light Dimmer :  280000.0
-----
You could own my face...and why wouldn't you want that? :  250

## Question 6 

In [145]:
cursor = collection.find({"name" :{"$regex" : "Sport"}})
len(list(cursor))

321