mikeckennedy
diff --git a/Diff for: ‎transcripts/ch10-conclusion/1.txt
+23 b/Diff for: ‎transcripts/ch10-conclusion/1.txt
+23
diff --git a/Diff for: ‎transcripts/ch10-conclusion/2.txt
+18 b/Diff for: ‎transcripts/ch10-conclusion/2.txt
+18
diff --git a/Diff for: ‎transcripts/ch10-conclusion/3.txt
+33 b/Diff for: ‎transcripts/ch10-conclusion/3.txt
+33
diff --git a/Diff for: ‎transcripts/ch10-conclusion/4.txt
+63 b/Diff for: ‎transcripts/ch10-conclusion/4.txt
+63
diff --git a/Diff for: ‎transcripts/ch10-conclusion/5.txt
+48 b/Diff for: ‎transcripts/ch10-conclusion/5.txt
+48
diff --git a/Diff for: ‎transcripts/ch10-conclusion/6.txt
+72 b/Diff for: ‎transcripts/ch10-conclusion/6.txt
+72
diff --git a/Diff for: ‎transcripts/ch10-conclusion/7.txt
+46 b/Diff for: ‎transcripts/ch10-conclusion/7.txt
+46
@@ -0,0 +1,23 @@
+00:00 There it is, the finish line!
+00:03 That's right, you've made it all the way to the end of this course,
+00:05 I hope you found it super interesting and you've learned a lot,
+00:08 because I believe you now have enough to build production ready applications
+00:13 and deploy them based on MongoDB.
+00:15 So really, the big question you need to be asking yourself is
+00:17 what are you going to build now,
+00:19 you have this amazing new power, this amazing new database,
+00:22 and way of writing data driven applications, what are you going to build?
+00:24 I hope you take what you learned in this course,
+00:26 and you go build something amazing.
+00:28 Now, before you do leave, and you go build that thing,
+00:31 let's talk about a few wrap up details;
+00:33 first of all, make sure you get the materials from the github repository,
+00:36 if you haven't already, go to
+00:38 github.com/mikeyckennedy/mongodb-for-python-developers,
+00:42 the url is there at the bottom, and star this, and consider also forking it
+00:46 so you have a permanent version for yourself.
+00:49 As far as I know, the git materials are entirely finished and published,
+00:54 there is a chance that somebody will find a small bug
+00:57 throughout the course and I'll amend that,
+00:59 so very likely what you see at this github repository is the final materials,
+01:04 it's certainly what you saw me create online during these videos.
@@ -0,0 +1,18 @@
+00:01 Before we put the wraps on this course
+00:04 let's do a quick lightning review of each chapter that we've covered.
+00:06 We're certainly not going to cover everything that we covered in the chapter,
+00:09 this is just a really quick review, but maybe the main takeaway from each chapter.
+00:13 So we began the course by talking about what is NoSql,
+00:17 and I think there's a little bit of a misunderstanding
+00:20 or maybe multiple definitions of what NoSql means
+00:23 sometimes people say it's not only sql,
+00:26 sometimes you people say it means that there's no sql, the language involved in this.
+00:31 Well what we saw is looking at the history back in 2009,
+00:34 this concept of NoSql came about by a meeting of people
+00:39 working on horizontal scales type of databases,
+00:42 like what trade-offs do they make against relational databases,
+00:45 so that they are more easily horizontally scalable,
+00:48 and basically cluster friendly databases.
+00:50 That world it's not whether or not there's no sequel
+00:53 or there is sequel in the language, it's really about the style of databases
+00:57 and the trade-offs around how they work with that data.
@@ -0,0 +1,33 @@
+00:01 The MongoDB shell and native query syntax;
+00:03 we saw that the MongoDB shell which you start by typing the word 'mongo'
+00:07 and it just runs the shell, tries to talk to the local one,
+00:10 there's all the different ways to get it to connect to different servers as we've seen.
+00:13 So once it starts you get this little greater than prompt
+00:16 and you write Javascript so we interact with MongoDB at the lowest level
+00:22 in Javascript in a textual way
+00:24 and actually this is converted to bson a binary extended version of json.
+00:29 So here we type something like db so this is the database we have active
+00:33 and book would be the collection name
+00:35 or table if you're still thinking relationally, but the collection name,
+00:38 and we say things like find or count or sort, or things like this
+00:42 and what we give it is this prototypical json object
+00:45 and what we get back are all the things that match the elements of that prototype.
+00:50 So here you can see we got two records back
+00:53 and they both had the same title as the title we indicated here.
+00:56 So it's very much about passing these prototypical json documents,
+01:00 however sometimes we have to do more than just say
+01:04 I want basically equality in my search,
+01:07 I would like to express things like greater than.
+01:09 So this query here that we have written
+01:12 is actually doing a couple of very interesting things,
+01:14 maybe the thing that stands out the most is this greater than operator,
+01:17 so the dollar gte is indicating, the dollar indicates an operator,
+01:20 and gte is the name the greater than or equal to operator,
+01:23 so instead of just saying ratings.value is nine,
+01:26 we're saying I'd like all the ratings where the value is either equal to or greater than nine.
+01:30 The other powerful and interesting thing here is
+01:33 we're actually traversing this hierarchy of the document
+01:36 we're going to find the ratings array which is a list of subdocuments
+01:39 which has a value as an integer,
+01:41 so we're actually reaching down inside that document
+01:44 and we're doing this query with this operator.
@@ -0,0 +1,63 @@
+00:00 Next step we worked with— PyMongo.
+00:02 So we put our Javascript away, we said all right enough with the Javascript stuff,
+00:05 we're going to write in Python basically for the rest of this course.
+00:08 So the lowest level way to talk to MongoDB from Python is with PyMongo.
+00:13 So let's look at a couple of the crud operations here.
+00:16 We'll start of course by importing the package, import PyMongo,
+00:20 and if you don't have it just pip install it;
+00:22 and then we need to create a Mongo client by passing a connection string,
+00:26 I believe if you actually get a hold of the PyMongo connection
+00:29 you can use it directly, but you should not, because the Mongo client handles
+00:34 reconnects and connection pulling is stuff like that
+00:36 whereas the connection itself wouldn't do those kinds of things.
+00:39 Then if we want to work with the database,
+00:42 we have this sort of interesting highly dynamic api,
+00:46 we go to the client and we just say . (dot) the name of the database
+00:49 so we say client.the_small_bookstore, and we assign that to db
+00:54 so it looks like the rest of the shell stuff that we have been doing,
+00:57 but technically that's optional.
+00:59 This database doesn't even have to exist,
+01:02 we could create the database in this style just by doing our first insert into it.
+01:05 Whether or not it exists, we get all the database
+01:08 and now we can operate on the collections.
+01:11 Let's imagine that in that database there's a collection called books
+01:15 and we want to know how many of them are,
+01:17 we would just say db.books.count
+01:20 and that would actually go there and do this operation.
+01:22 If it happens to be that either the database of the collection doesn't exist,
+01:25 it doesn't crash, you get zero.
+01:27 We could also do a find_one, this line here is notable
+01:31 because in the Javascript api is findOne
+01:34 and they've made a pythonic version here, so find_one
+01:39 just be aware that it's not always a one to one
+01:42 exact verbatim match from the native query syntax over to PyMongo.
+01:46 We can also do an actual search,
+01:50 before we said find_one I basically got the first
+01:54 here we're going to say I want to find a book by isbn, I want to pass it over,
+01:57 here we use Python dictionaries
+01:59 which play the role of those prototypical json objects.
+02:01 We also insert new data, so here we're going to say
+02:06 insert this thing which is a dictionary, it has a title called new book
+02:10 and an isbn of whatever is written there and we get back this result,
+02:15 the result will have this object id in the field inserted _id,
+02:20 we can go requery it and do all sorts of stuff with it.
+02:23 Basically when we say insert one, we get this result
+02:26 which, if it succeeds has the inserted id.
+02:29 Now these are the straightforward crud operations,
+02:31 we can also use our fancy in place operators,
+02:34 so here let's just insert this book, so we see what we get,
+02:36 and we grab a hold of the inserted id,
+02:38 and now suppose we want to add a field called favorited_by,
+02:43 and this is going to be a list, and we want the list to be basically distinct
+02:47 we're adding the ids of the customers or people visiting our site
+02:50 who have favorited in this book, and we'd like to put them in there
+02:54 but there's no reason to have them in there twice,
+02:56 that can cause all sorts of problems.
+02:58 We're going to use the dollar add to set, so we run this,
+03:01 run it again for 1002, and hey we could run it a second time for 1002,
+03:05 and what we'll end up with is an object that looks like this,
+03:08 the two things we inserted, the generated_id
+03:11 and his favorited_by list which has 1001 and 1002.
+03:15 Definitely keep in mind these in place operators
+03:19 because they're very powerful and they leverage some of the special properties
+03:23 of the way MongoDB treats documents atomically.
@@ -0,0 +1,48 @@
+00:01 Next up was document design.
+00:03 Some of the concepts and ideas of relational databases still apply here,
+00:07 you still are modeling data, you still put it into a database,
+00:10 but many of the techniques fall down,
+00:13 this whole concept of third normal form
+00:15 doesn't make nearly as much sense as it does in a relational database.
+00:18 What more we focus on often is really
+00:21 how do we make relationships either between documents or within documents.
+00:25 We saw the primary question, not the only one, but the most challenging one,
+00:30 the one you have to think most carefully about is to embed or not to embed,
+00:34 and I gave you a few rules or tips to help you guide this decision.
+00:38 One— is the embedded data wanted and you use it 80 percent of the time or more,
+00:44 most of the time when you get that containing document?
+00:48 If that's true, you probably want to embed,
+00:51 if that's false, maybe consider that as a warning sign not to.
+00:54 How often do you want the embedded document without the outer containing document?
+00:59 If often what you really want to get access to is these little inside pieces,
+01:03 there's a lot of overhead and it really kind of complicates the way
+01:07 you access it through your application,
+01:09 if you want to get them most of the time, or frequently, on their own.
+01:13 Is the embedded data abounded set?
+01:16 Remember, these documents can only be sixteen megabytes or larger,
+01:19 the number is way higher than you really want it to be,
+01:22 if this is an unbounded set you're going to continue to add to it,
+01:25 it very easily could outgrow the actual size that you're allowed to store.
+01:28 Really for a performance reason though, is it abounded set and is that set small?
+01:34 Because if you put huge amounts of data in there,
+01:36 you're going to really slow down your read time
+01:38 for these database operations that involve this document.
+01:41 These are the four main rules here,
+01:43 you also want to consider how your application accesses this data,
+01:47 it might be really easy to answer these four questions
+01:50 because there's a very constrained and small set of queries
+01:53 you run against your database;
+01:55 or it could be that you ask all sorts of questions in a highly varied ways
+01:59 in which case it's harder to answer those questions,
+02:02 the more types of queries you have the harder it is to know
+02:05 whether most of the time you want the embedded data for example.
+02:08 The more varied your queries are, the more you'll trend
+02:11 towards third normal form, relational style and less embedding.
+02:15 One of the situations where you have lots of varied queries is
+02:18 if you have this thing called an integration database,
+02:21 which we talked about sort of sharing a database across different applications,
+02:24 versus having one dedicated to a particular application
+02:27 where you can understand these questions very clearly.
+02:30 So when you're designing these documents
+02:33 you want to really think most carefully about do you want to embed this data
+02:36 or create a soft foreign key type of relationship.
@@ -0,0 +1,72 @@
+00:00 After we talked about document design
+00:03 and we talked about the raw access from PyMongo
+00:05 we said let's take this up a level of abstraction,
+00:08 let's actually build classes and map those over ORM style into MongoDB.
+00:14 We saw a really nice way to do that is with the ODM called MongoEngine.
+00:19 Let's review the main way that we sort of define classes
+00:23 and add constraints and things like that.
+00:25 Over here we are going to create this car object, this is our dealership example
+00:30 and we are going to store the car in the database.
+00:33 The way we create something that MongoEngine can manage
+00:37 in MongoDB as a top level document,
+00:40 is that we're going to derive from mongoengine.document.
+00:43 And then every field is going to be one of these fundamental field types,
+00:46 like StringField, IntField, FloatField and so on.
+00:50 And we can have some of them required, the first three required,
+00:53 we can have some of them with basic default values, like mileage defaults to zero
+00:59 but we can also have interesting functions,
+01:01 for example the vin number is automatically generated
+01:04 and we're based in this on the uuid4 random alphanumeric thing,
+01:08 so what we have here so far is really sort of equivalent
+01:11 to what you might have in a traditional relational database,
+01:15 there's entry and there is a flat set of what you would call columns,
+01:19 this is only part of the story,
+01:21 remember we can have nested documents,
+01:24 we can have actually a rich hierarchy of nested objects.
+01:27 One thing we might want to store in the car is an engine
+01:30 and the engine itself is a special type,
+01:33 here in the field it's going to be an embedded document field
+01:36 an engine derives from mongoengine.EmbeddedDocument,
+01:40 not document, embedded document.
+01:42 These we're never going to directly insert into the database,
+01:44 in fact, we're going to always put them into a car,
+01:48 so this is like a strong relationship between a car and its engine,
+01:51 we can even mark it as required.
+01:53 Now going a little further than that,
+01:55 our service history actually contains a list of subdocuments,
+01:58 each one modeled by the service record.
+02:00 The service record has things like the customer satisfaction,
+02:03 what service was performed and so on.
+02:06 Now if we take this, put some appropriate data into it and store it,
+02:10 we'll get something looking along the lines of this,
+02:12 in our document database in MongoDB,
+02:15 so here we have the first few elements that are just the flat fields
+02:18 and then we have the nested engine, one of them,
+02:21 we have the nested array of nested items for the service histories,
+02:24 and this really gets at the power of MongoDB,
+02:28 this nesting and these strong relationships
+02:31 where you get this aggregate object the car,
+02:34 that always contains everything we need to know about it.
+02:37 How about queering— we're not going to write now in the low level api,
+02:42 we're going to use basically the properties of these objects.
+02:46 Here's the function that we wrote where we wanted to ask the question
+02:49 what percentage of cars have bad customer rating,
+02:53 that would be average or below,
+02:56 so we're going to go to the car and we say objects,
+02:58 we could do lots of these objects.filter.filter.filter
+03:02 but if you just have one query you can just stick it in object,
+03:04 so as the objects service_history, now we can't say dot here,
+03:08 because service_history . customer_rating
+03:10 would not be a valid variable name or parameter name in Python,
+03:13 so we're going to traverse a hierarchy with a double underscore.
+03:17 We also might want to apply one of the operators,
+03:18 in this case we're going to say less than 4,
+03:21 so we're going to use again this double underscore,
+03:24 but in this case it's going to say on the left is the name of the target
+03:28 and on the right is the operator we're going to apply to it.
+03:31 You don't put the dollar again, that wouldn't be valid in Python,
+03:34 but double underscore __lt, and then we can ask
+03:38 things like count, or go and get the first one, or things like that.
+03:42 We can even do paging by slicing on that result.
+03:45 This syntax lets us use almost the entire spectrum of the way of creating MongoDB
+03:50 really straightforward and in a way that ties back to the car object that we defined.
@@ -0,0 +1,46 @@
+00:01 At this point, we pretty much had MongoDB
+00:03 doing everything we needed it to do,
+00:05 and we'd heard MongoDB was fast,
+00:07 but it turned out it didn't really seem to be behaving as quickly as maybe we hoped,
+00:11 we put a ton of data from our dealership in there,
+00:14 and we were getting query times of like one second, 700 milliseconds, stuff like that.
+00:17 It was okay, but really, we saw it can do much better.
+00:20 What levers and knobs do we have to turn to make this faster?
+00:24 The most important one, even more important than in relational databases,
+00:28 are the indexes, we'll see MongoEngine as well as PyMongo in the shell
+00:33 all have really good ways to deal with this.
+00:35 Document design is really important, mostly around this embedding question
+00:39 but there are many ways to think about document design,
+00:42 there's a lot of really non intuitive and powerful patterns,
+00:45 design patterns you can apply here.
+00:48 What is your query style, maybe one query is better than another
+00:51 and using projections to only pull back a subset of responses,
+00:56 suppose we have a car that has a ton of those service histories
+00:59 and we don't care about them for a particular query
+01:02 we could suppress returning those from the database
+01:04 which saves us a lot of bandwidth on the network,
+01:07 disks reads on the database server and deserialization processing on our side.
+01:11 We also saw there is some network apology things we can do,
+01:15 replication and sharding, and those are both interesting and powerful
+01:19 but not part of this course, so go check that out on your own if you're interested.
+01:23 For indexes, we took an example like our car
+01:27 and we said let's suppose we have make here
+01:30 that we're interested in querying by a service history,
+01:32 and if you look below how service history is defined as the service record objects
+01:36 and they have a description and a customer rating
+01:39 and things like this, price for example,
+01:41 so our goal is to query these things, the make, the service history and stuff, quickly,
+01:45 so we saw adding an index which really a powerful way to do that,
+01:48 so all we've got to do is go to our meta object, our meta element here
+01:52 and say these are the index as an array
+01:55 now these indexes can simply be the name of the thing,
+01:58 like make that's super straightforward,
+02:01 they could traverse the hierarchy using the Javascript style, using the dot,
+02:05 so we'll service_history.customer_rating
+02:08 and that would go down and let us do queries deep into these cars
+02:12 and say let's find the ones that are either good or low customer ratings
+02:17 and we can even do composite indexes,
+02:19 so here we're having a composite index on price and description,
+02:22 within the service history, so we do that by having this fields dictionary thing
+02:27 and the fields are an array, so you can use the simple version
+02:29 or if you need to, you can get a more complex definition of the index there.