## **Exploring data with R - visualize data**

There's a wise saying that goes like this: a picture is worth a thousand rows and columns.

Actually, sorry, we made that up - there is no such wise saying 😄. But you get the gist of it, right?

In this notebook, we'll apply basic techniques to analyze data with basic statistics and visualise using graphs with `ggplot2`, a core member of the Tidyverse.

## **Loading our data**

Before we begin, lets load the same data about study hours that we analysed in the previous notebook. We will also recalculate who passed, in the same way as last time. Run the code in the cell below by clicking the **► Run** button to see the data.


In [1]:
library(DBI)
library(RSQLite)
library(tidyverse)

In [2]:
mydb <- dbConnect(RSQLite::SQLite(), "")

In [6]:
ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-25/ratings.csv')
details <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-25/details.csv')

ratings %>% 
  slice_head(n = 5)

details %>% 
  slice_head(n = 5)

[1mRows: [22m[34m21831[39m [1mColumns: [22m[34m10[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): name, url, thumbnail
[32mdbl[39m (7): num, id, year, rank, average, bayes_average, users_rated

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m21631[39m [1mColumns: [22m[34m23[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (10): primary, description, boardgamecategory, boardgamemechanic, boardg...
[32mdbl[39m (13): num, id, yearpublished, minplayers, maxplayers, playingtime, minpl...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_

num,id,name,year,rank,average,bayes_average,users_rated,url,thumbnail
<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>
105,30549,Pandemic,2008,106,7.59,7.487,108975,/boardgame/30549/pandemic,https://cf.geekdo-images.com/S3ybV1LAp-8SnHIXLLjVqA__micro/img/S4tXI3Yo7BtqmBoKINLLVUFsaJ0=/fit-in/64x64/filters:strip_icc()/pic1534148.jpg
189,822,Carcassonne,2000,190,7.42,7.309,108738,/boardgame/822/carcassonne,https://cf.geekdo-images.com/okM0dq_bEXnbyQTOvHfwRA__micro/img/VfLoKzfk3xj26ArmDu55qZ4sysw=/fit-in/64x64/filters:strip_icc()/pic6544250.png
428,13,Catan,1995,429,7.14,6.97,108024,/boardgame/13/catan,https://cf.geekdo-images.com/W3Bsga_uLP9kO91gZ7H8yw__micro/img/LA4OvGfQ_TXQ-2mhaIFZp2ITWpc=/fit-in/64x64/filters:strip_icc()/pic2419375.jpg
72,68448,7 Wonders,2010,73,7.74,7.634,89982,/boardgame/68448/7-wonders,https://cf.geekdo-images.com/RvFVTEpnbb4NM7k0IF8V7A__micro/img/9glsOs7zoTbkVpfDt5SHWJm-kRA=/fit-in/64x64/filters:strip_icc()/pic860217.jpg
103,36218,Dominion,2008,104,7.61,7.499,81561,/boardgame/36218/dominion,https://cf.geekdo-images.com/j6iQpZ4XkemZP07HNCODBA__micro/img/PVxqHWOLTb3n-4xe62LJadr_M0I=/fit-in/64x64/filters:strip_icc()/pic394356.jpg


num,id,primary,description,yearpublished,minplayers,maxplayers,playingtime,minplaytime,maxplaytime,⋯,boardgamefamily,boardgameexpansion,boardgameimplementation,boardgamedesigner,boardgameartist,boardgamepublisher,owned,trading,wanting,wishing
<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>
0,30549,Pandemic,"In Pandemic, several virulent diseases have broken out simultaneously all over the world! The players are disease-fighting specialists whose mission is to treat disease hotspots while researching cures for each of four plagues before they get out of hand.&#10;&#10;The game board depicts several major population centers on Earth. On each turn, a player can use up to four actions to travel between cities, treat infected populaces, discover a cure, or build a research station. A deck of cards provides the players with these abilities, but sprinkled throughout this deck are Epidemic! cards that accelerate and intensify the diseases' activity. A second, separate deck of cards controls the &quot;normal&quot; spread of the infections.&#10;&#10;Taking a unique role within the team, players must plan their strategy to mesh with their specialists' strengths in order to conquer the diseases. For example, the Operations Expert can build research stations which are needed to find cures for the diseases and which allow for greater mobility between cities; the Scientist needs only four cards of a particular disease to cure it instead of the normal five&mdash;but the diseases are spreading quickly and time is running out. If one or more diseases spreads beyond recovery or if too much time elapses, the players all lose. If they cure the four diseases, they all win!&#10;&#10;The 2013 edition of Pandemic includes two new characters&mdash;the Contingency Planner and the Quarantine Specialist&mdash;not available in earlier editions of the game.&#10;&#10;Pandemic is the first game in the Pandemic series.&#10;&#10;",2008,2,4,45,45,45,⋯,"['Components: Map (Global Scale)', 'Components: Multi-Use Cards', 'Digital Implementations: Board Game Arena', 'Game: Pandemic', 'Medical: Diseases', 'Occupation: Dispatcher', 'Occupation: Medic / Doctor / Nurses', 'Occupation: Researcher / Scientist', 'Region: The World']","['Pandemic: Gen Con 2016 Promos – Z-Force Team Member/Game Convention', 'Pandemic: In the Lab', 'Pandemic: On the Brink', 'Pandemic: Promo Roles', 'Pandemic: State of Emergency', 'Pandemic: Survival Promos – Crisis Mitigator/Relocation Specialist', 'Pandemie: Uitbreiding ""De Generalist""']","['Pandemic Legacy: Season 0', 'Pandemic Legacy: Season 1', 'Pandemic Legacy: Season 2', 'Pandemic: Fall of Rome', 'Pandemic: Hot Zone – Europe', 'Pandemic: Hot Zone – North America', 'Pandemic: Iberia', 'Pandemic: Reign of Cthulhu', 'Pandemic: Rising Tide', 'Pandemic: The Cure', 'World of Warcraft: Wrath of the Lich King']",['Matt Leacock'],"['Josh Cappel', 'Christian Hanisch', 'Régis Moulun', 'Chris Quilliams', 'Tom Thiel']","['Z-Man Games', 'Albi', 'Asmodee', 'Asmodee Italia', 'Asterion Press', 'Bergsala Enigma (Enigma)', 'Brain Games', 'Devir', 'Filosofia Éditions', 'Galápagos Jogos', 'Gém Klub Kft.', 'HaKubia', 'Hobby Japan', 'HomoLudicus', 'Jolly Thinkers', 'Kaissa Chess & Games', 'Korea Boardgames Co., Ltd.', 'Lacerta', 'Lautapelit.fi', 'Lifestyle Boardgames Ltd', 'MINDOK', 'Nordic Games GmbH', 'Paladium Games', 'Pegasus Spiele', 'Quined White Goblin Games', 'Rebel Sp. z o.o.', 'Siam Board Games', 'Stratelibri', 'Wargames Club Publishing', 'White Goblin Games', 'Zhiyanjia', 'Ігромаг', 'Взрослые дети']",168364,2508,625,9344
1,822,Carcassonne,"Carcassonne is a tile-placement game in which the players draw and place a tile with a piece of southern French landscape on it. The tile might feature a city, a road, a cloister, grassland or some combination thereof, and it must be placed adjacent to tiles that have already been played, in such a way that cities are connected to cities, roads to roads, etcetera. Having placed a tile, the player can then decide to place one of their meeples on one of the areas on it: on the city as a knight, on the road as a robber, on a cloister as a monk, or on the grass as a farmer. When that area is complete, that meeple scores points for its owner.&#10;&#10;During a game of Carcassonne, players are faced with decisions like: &quot;Is it really worth putting my last meeple there?&quot; or &quot;Should I use this tile to expand my city, or should I place it near my opponent instead, giving him a hard time to complete their project and score points?&quot; Since players place only one tile and have the option to place one meeple on it, turns proceed quickly even if it is a game full of options and possibilities.&#10;&#10;First game in the Carcassonne series.&#10;&#10;",2000,2,5,45,30,45,⋯,"['Cities: Carcassonne (France)', 'Components: Meeples (Black)', 'Components: Meeples (Blue)', 'Components: Meeples (Green)', 'Components: Meeples (Red)', 'Components: Meeples (Yellow)', 'Components: Wooden pieces & boards', 'Country: France', 'Digital Implementations: Board Game Arena', 'Game: Carcassonne', 'Region: Languedoc (France)']","['20 Jahre Darmstadt Spielt', 'Apothecaries (fan expansion for Carcassonne)', 'Apothecaries and Tithes (fan expansion for Carcassonne)', 'Die Bettler (Fan-Erweiterung für Carcassonne)', 'The Big Black Pig Escape (Fan expansion to Carcassonne)', 'Breweries (fan expansion for Carcassonne)', 'Carcassonne Maps: Benelux', 'Carcassonne Maps: Deutschland', 'Carcassonne Maps: France', 'Carcassonne Maps: Great Britain', 'Carcassonne Maps: Península Ibérica', 'Carcassonne Maps: Taiwan', 'Carcassonne Maps: USA East', 'Carcassonne Maps: USA West', 'Carcassonne: Bonusplättchen Spiel 2014', 'Carcassonne: Bonusplättchen Spiel 2015', 'Carcassonne: Bonusplättchen Spiel 2016', 'Carcassonne: Bonusplättchen Spiel 2017', 'Carcassonne: Bonusplättchen Spiel 2018', 'Carcassonne: Bonusplättchen Spiel 2019', 'Carcassonne: Bonusplättchen Spiel 2020', 'Carcassonne: Bonusplättchen Spiel 2021', 'Carcassonne: Castles in Germany', 'Carcassonne: Corn Circles II', 'Carcassonne: Cult, Siege & Creativity', 'Carcassonne: CutCassonne', 'Carcassonne: Darmstadt', 'Carcassonne: Das Labyrinth', 'Carcassonne: Der Tunnel', 'Carcassonne: Die Belagerer', 'Carcassonne: Die Katharer', 'Carcassonne: Die Kornkreise', 'Carcassonne: Die Märkte zu Leipzig', 'Carcassonne: Die Wahrsagerin', 'Carcassonne: Die Windrosen', 'Carcassonne: Easter in Carcassonne', 'Carcassonne: Expansion 1 – Inns & Cathedrals', 'Carcassonne: Expansion 10 – Under the Big Top', 'Carcassonne: Expansion 2 – Traders & Builders', 'Carcassonne: Expansion 3 – The Princess & The Dragon', 'Carcassonne: Expansion 4 – The Tower', 'Carcassonne: Expansion 5 – Abbey & Mayor', 'Carcassonne: Expansion 6 – Count, King & Robber', 'Carcassonne: Expansion 7 – The Catapult', 'Carcassonne: Expansion 8 – Bridges, Castles and Bazaars', 'Carcassonne: Expansion 9 – Hills & Sheep', 'Carcassonne: German Cathedrals', 'Carcassonne: GQ Promo Tiles', 'Carcassonne: Halb so Wild', 'Carcassonne: Halb so wild I', 'Carcassonne: Halb so wild II', 'Carcassonne: King & Scout', 'Carcassonne: Klöster in Deutschland', 'Carcassonne: Kreivi ja Kuningas', 'Carcassonne: La Porxada', 'Carcassonne: Little Buildings', 'Carcassonne: Mage & Witch', 'Carcassonne: Nikolaus-Zählleiste', 'Carcassonne: Spiel Doch Mini Expansion', 'Carcassonne: Spiel Doch! Expansion', 'Carcassonne: The Barber-Surgeons', 'Carcassonne: The City Gates', 'Carcassonne: The Count of Carcassonne', 'Carcassonne: The Cult', 'Carcassonne: The Ferries', 'Carcassonne: The Festival', 'Carcassonne: The Flying Machines', 'Carcassonne: The Fruit-Bearing Trees', 'Carcassonne: The Gifts', 'Carcassonne: The Gold Mines', 'Carcassonne: The Land Surveyors', 'Carcassonne: The Messengers', 'Carcassonne: The Peasant Revolts', 'Carcassonne: The Phantom', 'Carcassonne: The Plague', 'Carcassonne: The River', 'Carcassonne: The River II', 'Carcassonne: The Robbers', 'Carcassonne: The School', 'Carcassonne: The Signposts', 'Carcassonne: The Tollkeepers', 'Carcassonne: Watchtowers', 'Castles in Hungary (fan expansion to Carcassonne)', 'Circles in the Forest (fan expansion for Carcassonne)', 'Cleric And Serf (fan expansion to Carcassonne)', 'The Coast (fan expansion for Carcassonne)', 'Corn Circles 3 (fan expansion to Carcassonne)', 'Divided Cities (fan expansion for Carcassonne)', 'Dragon Hunters (fan expansion to Carcassonne)', 'Dragon Killers (fan expansion for Carcassonne)', 'Dragon Rider Slayer (fan expansion for Carcassonne: The Princess & The Dragon)', 'Drought and Pestilence (fan expansion for Carcassonne)', 'Eisenbahn (fan expansion for Carcassonne)', 'Die Eroberer (fan Expansion to Carcassonne)', 'Evergreen Forest (fan expansion to Carcassonne)', 'Family Feud (fan expansion to Carcassonne)', 'Fields and Vineyards (fan expansion for Carcassonne)', 'Fischerhütte (Fan-Erweiterung für Carcassonne)', 'Fisherman: Angler & Fish Farm (fan expansion for Carcassonne)', 'Fisherman: Swan Lake (fan expansion for Carcassonne)', 'Fisherman: Waterfalls (fan expansion for Carcassonne)', 'Fishing Boats (fan expansion to Carcassonne)', 'Forests (fan expansion for Carcassonne)', 'Forests: An Apple a Day (fan expansion for Carcassonne)', 'Forests: Fairy Tales (fan expansion for Carcassonne)', 'Forests: The Forest Fire (fan expansion for Carcassonne)', 'Forests: Timber! (fan expansion for Carcassonne)', 'Friar & Farmhand (fan expansion for Carcassonne)', 'Fruit Trader (fan expansion to Carcassonne)', 'The Gallows (fan expansion for Carcassonne)', 'Gold Mines (fan expansion for Carcassonne)', 'The Grim Reaple (fan expansion for Carcassonne)', 'Holzfäller & Müller (Fan Erweiterung für Carcassonne)', 'In the Stocks (fan expansion for Carcassonne)', 'Inn & Stable Owners (fan expansion to Carcassonne)', 'The Jester and the Minstrel (fan expansion to Carcassonne)', 'Jousting Tournament (fan expansion for Carcassonne)', 'Kettle of Fish (fan expansion to Carcassonne)', 'Lakelands (fan expansion for Carcassonne)', 'The Land Surveyors 2 (fan expansion for Carcassonne)', 'Landschaftskarten die Dritte (Fan-Erweiterung für Carcassonne)', 'Lavender Fields (fan expansion to Carcassonne)', 'Lord of the Manor (fan expansion for Carcassonne)', 'The Medieval Expansion (fan expansion for Carcassonne)', 'Merry Men (fan expansion to Carcassonne)', 'Mills and Bakeries (fan expansion for Carcassonne)', 'More River (fan expansion for Carcassonne)', 'The Mount of the Duke (fan expansion for Carcassonne)', 'Mountains (fan expansion for Carcassonne)', 'Nochmal neue Landschaftskarten (Fan-Erweiterung für Carcassonne)', 'The Ocean (fan expansion for Carcassonne)', 'The Orders of Chivalry (fan expansion for Carcassonne)', 'Outposts (fan expansion to Carcassonne)', 'Pirate Coast (fan expansion for Carcassonne)', 'The Pope of Avignon (fan expansion for Carcassonne)', 'Ramparts (fan expansion for Carcassonne)', 'River System (fan expansion for Carcassonne)', 'Spielbox-Almanach 25 Jahre Hans im Glück Beilage', 'Tithe barns (fan expansion to Carcassonne)', 'Troubadours (fan expansion for Carcassonne)', 'Truppenaufmarsch (Fan-Erweiterung für Carcassonne)', 'Upper Carcassonne (fan expansion for Carcassonne)', 'Wald (Fan Erweiterung für Carcassonne)', 'Die Wälder von Carcassonne V2 (Fan Erweiterung für Carcassonne)', 'Weg durch die Stadt (Fan-Erweiterung für Carcassonne)', 'Wells (fan expansion for Carcassonne)', 'Wells: Fountain of Youth (fan expansion for Carcassonne)', 'Wells: Wishing Wells (fan expansion for Carcassonne)', 'Wheat Fields (fan expansion for Carcassonne)', 'Wine Merchant (fan expansion to Carcassonne)', 'Каркассон: Дворяне и Башни', 'Каркассон: Наука и магия', 'Каркассон: Предместья и обитатели', 'Каркассон: Солове́й-Разбо́йник & Водяной', 'Каркассон: тайл Избушка']","['The Ark of the Covenant', 'Carcassonne für 2', 'Carcassonne Junior', 'Carcassonne: Amazonas', 'Carcassonne: Demonstration', 'Carcassonne: Gold Rush', 'Carcassonne: Hunters and Gatherers', 'Carcassonne: Over Hill and Dale', 'Carcassonne: Safari', 'Carcassonne: South Seas', 'Carcassonne: Star Wars', 'Carcassonne: The Castle', 'Carcassonne: The City', 'Carcassonne: The Discovery', 'Carcassonne: Winter Edition', 'New World: A Carcassonne Game', 'Travel Carcassonne']",['Klaus-Jürgen Wrede'],"['Doris Matthäus', 'Anne Pätzke', 'Chris Quilliams', 'Klaus-Jürgen Wrede']","['Hans im Glück', '999 Games', 'Albi', 'Bard Centrum Gier', 'Bergsala Enigma (Enigma)', 'Brain Games', 'cutia.ro', 'Devir', 'Fantasmagoria', 'Filosofia Éditions', 'Giochi Uniti', 'Grow Jogos e Brinquedos', 'Hobby World', 'Ísöld ehf.', 'Kaissa Chess & Games', 'Korea Boardgames Co., Ltd.', 'Lautapelit.fi', 'Midgaard Games', 'MINDOK', 'Möbius Games', 'Monkey Time', 'NeoTroy Games', 'Nordic Games ehf', 'Paper Iyagi', 'Piatnik', 'Ponva d.o.o.', 'Rio Grande Games', 'Schmidt Spiele', 'Smart Ltd', 'Stupor Mundi', 'SuperHeated Neurons', 'Swan Panasia Co., Ltd.', 'Venice Connection', 'Ventura Games', 'Z-Man Games']",161299,1716,582,7383
2,13,Catan,"In CATAN (formerly The Settlers of Catan), players try to be the dominant force on the island of Catan by building settlements, cities, and roads. On each turn dice are rolled to determine what resources the island produces. Players build by spending resources (sheep, wheat, wood, brick and ore) that are depicted by these resource cards; each land type, with the exception of the unproductive desert, produces a specific resource: hills produce brick, forests produce wood, mountains produce ore, fields produce wheat, and pastures produce sheep.&#10;&#10;Setup includes randomly placing large hexagonal tiles (each showing a resource or the desert) in a honeycomb shape and surrounding them with water tiles, some of which contain ports of exchange. Number disks, which will correspond to die rolls (two 6-sided dice are used), are placed on each resource tile. Each player is given two settlements (think: houses) and roads (sticks) which are, in turn, placed on intersections and borders of the resource tiles. Players collect a hand of resource cards based on which hex tiles their last-placed house is adjacent to. A robber pawn is placed on the desert tile.&#10;&#10;A turn consists of possibly playing a development card, rolling the dice, everyone (perhaps) collecting resource cards based on the roll and position of houses (or upgraded cities&mdash;think: hotels) unless a 7 is rolled, turning in resource cards (if possible and desired) for improvements, trading cards at a port, and trading resource cards with other players. If a 7 is rolled, the active player moves the robber to a new hex tile and steals resource cards from other players who have built structures adjacent to that tile.&#10;&#10;Points are accumulated by building settlements and cities, having the longest road and the largest army (from some of the development cards), and gathering certain development cards that simply award victory points. When a player has gathered 10 points (some of which may be held in secret), he announces his total and claims the win.&#10;&#10;CATAN has won multiple awards and is one of the most popular games in recent history due to its amazing ability to appeal to experienced gamers as well as those new to the hobby.&#10;&#10;Die Siedler von Catan was originally published by KOSMOS and has gone through multiple editions. It was licensed by Mayfair and has undergone four editions as The Settlers of Catan. In 2015, it was formally renamed CATAN to better represent itself as the core and base game of the CATAN series. It has been re-published in two travel editions, portable edition and compact edition, as a special gallery edition (replaced in 2009 with a family edition), as an anniversary wooden edition, as a deluxe 3D collector's edition, in the basic Simply Catan, as a beginner version, and with an entirely new theme in Japan and Asia as Settlers of Catan: Rockman Edition. Numerous spin-offs and expansions have also been made for the game.&#10;&#10;",1995,3,4,120,60,120,⋯,"['Animals: Sheep', 'Components: Hexagonal Tiles', 'Components: Wooden pieces & boards', 'Game: Catan', 'Promotional: Promo Board Games']","['20 Jahre Darmstadt Spielt', 'Brettspiel Adventskalender 2015', 'Catan Austria / Wien meets Catan', 'Catan Geographies: Austria', 'Catan Geographies: Bayern Edition', 'Catan Geographies: Corsica', 'Catan Geographies: Georgia', 'Catan Geographies: Kennessee', 'Catan Geographies: Mallorca', 'Catan Geographies: North Rhine – Westphalia', 'Catan Geographies: Rickshaw Run', 'Catan Geographies: Settlers of Hesse', 'Catan Geographies: The Carolinas', 'Catan Länderszenarien: Polen', 'Catan Rhein-Main-Neckar', 'Catan Scenario: Crop Trust', 'Catan Scenario: Durango', 'Catan Scenarios: Easter Bunny', 'Catan Scenarios: #WeStayHome', 'Catan Scenarios: Big Game Big Honor', 'Catan Scenarios: Catanimals', 'Catan Scenarios: Frenemies', 'Catan Scenarios: Global Warming', 'Catan Scenarios: Helpers of Catan', 'Catan Scenarios: Oil Springs', 'Catan World Championship Berlin 2014 Special', 'Catan: 5-6 Player Extension', 'Catan: 999 Games 25 jaar Expansion', 'Catan: Catakatoa', 'Catan: Catan Day 2015 Exclusive Expansion', 'Catan: Cities & Knights', 'Catan: Cities & Knights – 5-6 Player Extension', 'Catan: Cities & Knights – Legend of the Conquerors', 'Catan: Delmarva', 'Catan: Die Stoffräuber', 'Catan: Event Cards', 'Catan: Explorers & Pirates', 'Catan: Explorers & Pirates – 5-6 Player Extension', 'Catan: Hawaii (Szenario für Seefahrer)', 'Catan: High Priests of the Inkas', 'Catan: Indiana & Ohio', 'Catan: New England', 'Catan: New York', 'Catan: Penn-Jersey', 'Catan: Playmat Desert', 'Catan: Playmat Gold', 'Catan: Seafarers', 'Catan: Seafarers Scenario – Legend of the Sea Robbers', 'Catan: Seafarers – 5-6 Player Extension', 'Catan: Seefahrer – 20 Jahre Jubiläums-Edition', 'Catan: Szenario Der Kölner Dom', 'Catan: Traders & Barbarians', 'Catan: Traders & Barbarians – 5-6 Player Extension', 'Catan: Treasures, Dragons & Adventurers', 'Der Hafenmeister', 'Heroes & Capitols (fan expansion for Settlers of Catan)', 'Hexen, Zauberer & Drachen (fan expansion for Catan: Cities and Knights)', 'Katani pankur', 'Kirche, Glaube & Reformation (fan expansion for Catan: Cities and Knights)', 'De Kolonisten van Catan: De Diamanten', 'De Kolonisten van Catan: De drie Handelsteden van Noord-Nederland', 'De Kolonisten van Catan: De Koloniën', 'De Kolonisten van Catan: De Specialisten', 'De Kolonisten van Catan: De Wereldwonderen', 'De Kolonisten van Catan: De Woestijnruiters', 'De Kolonisten van Catan: Het Grote Kanaal', 'Mayfair Game Variants & Mini-Expansions Set #1', 'Die Pioniere (fan expansion for The Settlers of Catan)', ""Saggsen-Gadan: De säggs'schn Siedler / Catan-OFFENSIVE in Chemnitz"", 'Settlers of Catan Scenario: The Jungle', 'Settlers of Catan Scenario: The Volcano', 'The Settlers of Catan: The Fishermen of Catan', 'The Settlers of Catan: The Great River', 'Settlers of New Catan (and extra modules)', 'Die Siedler von Catan: Atlantis – Szenarien & Varianten', 'Die Siedler von Catan: Das Buch zum Spielen', 'Die Siedler von Catan: Der Schokoladenmarkt', 'Die Siedler von Catan: Die große Karawane', 'Die Siedler von Catan: Hispania Edition', 'Die Siedler von Catan: Historische Szenarien', 'Die Siedler von Catan: Historische Szenarien II', 'Die Siedler von Catan: Hochzeitsturm', 'Die Siedler von Catan: Renaissance in der Steiermark & Burgbau auf Chaffenberch', 'Die Siedler von Catan: Rincewind und der Tourist / Die Gilden von Ankh-Morpork', 'Die Siedler von Catan: Thüringen Edition', 'Die Siedler von Luxemburg', 'World Wonders (fan expansion for Catan)']","['Baden-Württemberg Catan', 'Catan Geographies: Germany', 'Catan Histories: Merchants of Europe', 'Catan Histories: Rise of the Inkas', 'Catan Histories: Settlers of America – Trails to Rails', 'Catan Histories: Struggle for Rome', 'Catan: Ancient Egypt', 'Catan: Big Game Event Kit', 'Catan: Core + China Map', 'Catan: Family Edition', 'Catan: Portable Edition', 'Catan: Starfarers', 'Catan: Traveler – Compact Edition', 'The Communication in Catan', 'A Game of Thrones: Catan – Brotherhood of the Watch', 'The Kids of Catan', 'De Kolonisten van de Lage Landen', 'The Settlers of Canaan', 'Settlers of Catan: Gallery Edition', 'Settlers of Catan: Rockman Edition', 'The Settlers of the Stone Age', 'The Settlers of Zarahemla', 'Die Siedler von Catan: Junior', 'Die Siedler von Nürnberg', 'Simply Catan', 'Star Trek: Catan', 'The Starfarers of Catan', 'Das Wasser des Lebens', 'Wien Catan']",['Klaus Teuber'],"['Volkan Baga', 'Tanja Donner', 'Pete Fenlon', 'Jason Hawkins', 'Michaela Kienle', 'Harald Lieske', 'Michael Menzel', 'Marion Pott', 'Matt Schwabel', 'Franz Vohwinkel', 'Stephen Graham Walsh']","['KOSMOS', '999 Games', 'Albi', 'Asmodee', 'Astrel Games', 'Bergsala Enigma (Enigma)', 'Brädspel.se', 'Brain Games', 'Broadway Toys LTD', 'Capcom Co., Ltd.', 'Catan Studio', 'Competo / Marektoy', 'danspil', 'Descartes Editeur', 'Devir', 'Dexy Co', 'Eurogames', 'Filosofia Éditions', 'Galakta', 'Giochi Uniti', 'GP Games', 'Grow Jogos e Brinquedos', 'HaKubia', 'Hanayama', 'Hobby World', 'Ideal Board Games', 'Igroljub', 'IntelliGames.BG', 'Ísöld ehf.', 'Kaissa Chess & Games', 'Korea Boardgames Co., Ltd.', 'L&M Games', 'Laser plus', 'Lautapelit.fi', 'Logojogos', 'Mayfair Games', 'Ninive Games', 'Paper Iyagi', 'Piatnik', 'Smart Ltd', 'Spilbræt.dk', 'Stupor Mundi', 'SuperHeated Neurons', 'Swan Panasia Co., Ltd.', 'Tilsit', 'Top Toys', 'TRY SOFT', 'Vennerød Forlag AS']",167733,2018,485,5890
3,68448,7 Wonders,"You are the leader of one of the 7 great cities of the Ancient World. Gather resources, develop commercial routes, and affirm your military supremacy. Build your city and erect an architectural wonder which will transcend future times.&#10;&#10;7 Wonders lasts three ages. In each age, players receive seven cards from a particular deck, choose one of those cards, then pass the remainder to an adjacent player. Players reveal their cards simultaneously, paying resources if needed or collecting resources or interacting with other players in various ways. (Players have individual boards with special powers on which to organize their cards, and the boards are double-sided). Each player then chooses another card from the deck they were passed, and the process repeats until players have six cards in play from that age. After three ages, the game ends.&#10;&#10;In essence, 7 Wonders is a card development game. Some cards have immediate effects, while others provide bonuses or upgrades later in the game. Some cards provide discounts on future purchases. Some provide military strength to overpower your neighbors and others give nothing but victory points. Each card is played immediately after being drafted, so you'll know which cards your neighbor is receiving and how her choices might affect what you've already built up. Cards are passed left-right-left over the three ages, so you need to keep an eye on the neighbors in both directions.&#10;&#10;Though the box of earlier editions is listed as being for 3&ndash;7 players, there is an official 2-player variant included in the instructions.&#10;&#10;",2010,2,7,30,30,30,⋯,"['Ancient: Babylon', 'Ancient: Egypt', 'Ancient: Greece', 'Digital Implementations: Board Game Arena', 'Game: 7 Wonders', 'Mechanism: Artificial Player', 'Mechanism: Tableau Building']","['7 Wonders: Armada', '7 Wonders: Babel', '7 Wonders: Catan', '7 Wonders: Cities', '7 Wonders: Leaders', '7 Wonders: Manneken Pis', '7 Wonders: Wonder Pack', 'Collection (fan expansion for 7 Wonders)', 'Empires (fan expansion for 7 Wonders)', 'Game Wonders (fan expansion for 7 Wonders)', 'Lost Wonders (fan expansion for 7 Wonders)', 'Modern Wonders (fan expansion for 7 Wonders)', 'More Wonders... (fan expansion for 7 Wonders)', 'Myths (fan expansion for 7 Wonders)', 'Ruins (fan expansion for 7 Wonders)', 'Sailors (fan expansion for 7 Wonders)']","['7 Wonders (Second Edition)', '7 Wonders Duel', '7 Wonders: Architects']",['Antoine Bauza'],"['Dimitri Chappuis', 'Miguel Coimbra', 'Etienne Hebinger', 'Cyril Nouvel']","['Repos Production', 'ADC Blackfire Entertainment', 'Asmodee', 'Asterion Press', 'Galápagos Jogos', 'Gém Klub Kft.', 'Hobby Japan', 'Kaissa Chess & Games', 'Korea Boardgames Co., Ltd.', 'Lautapelit.fi', 'Lifestyle Boardgames Ltd', 'NeoTroy Games', 'Rebel Sp. z o.o.', 'Siam Board Games']",120466,1567,1010,12105
4,36218,Dominion,"&quot;You are a monarch, like your parents before you, a ruler of a small pleasant kingdom of rivers and evergreens. Unlike your parents, however, you have hopes and dreams! You want a bigger and more pleasant kingdom, with more rivers and a wider variety of trees. You want a Dominion! In all directions lie fiefs, freeholds, and feodums. All are small bits of land, controlled by petty lords and verging on anarchy. You will bring civilization to these people, uniting them under your banner.&#10;&#10;But wait! It must be something in the air; several other monarchs have had the exact same idea. You must race to get as much of the unclaimed land as possible, fending them off along the way. To do this you will hire minions, construct buildings, spruce up your castle, and fill the coffers of your treasury. Your parents wouldn't be proud, but your grandparents, on your mother's side, would be delighted.&quot;&#10;&#10;&mdash;description from the back of the box&#10;&#10;In Dominion, each player starts with an identical, very small deck of cards. In the center of the table is a selection of other cards the players can &quot;buy&quot; as they can afford them. Through their selection of cards to buy, and how they play their hands as they draw them, the players construct their deck on the fly, striving for the most efficient path to the precious victory points by game end.&#10;&#10;Dominion is not a CCG, but the play of the game is similar to the construction and play of a CCG deck. The game comes with 500 cards. You select 10 of the 25 Kingdom card types to include in any given play&mdash;leading to immense variety.&#10;&#10;&mdash;user summary&#10;&#10;Part of the Dominion series.&#10;&#10;",2008,2,4,30,30,30,⋯,"['Crowdfunding: Wspieram', 'Game: Dominion', 'Misc: Mensa Select']","['Ancient Times (fan expansion for Dominion)', 'Animals (fan expansion for Dominion)', 'The Books of Magic (fan expansion for Dominion)', 'Dominion: Adventures', 'Dominion: Alchemisten & Reiche Ernte – Mixbox', 'Dominion: Alchemy', 'Dominion: Allies', 'Dominion: Black Market Promo Card', 'Dominion: Captain Promo Card', 'Dominion: Church Promo Card', 'Dominion: Cornucopia', 'Dominion: Dark Ages', 'Dominion: Die Intrige – Erweiterung', 'Dominion: Dismantle Promo Card', 'Dominion: Empires', 'Dominion: Envoy Promo Card', 'Dominion: Erweiterung – Basisspiel & Die Intrige', 'Dominion: Fan-Edition I', 'Dominion: Governor Promo Card', 'Dominion: Guilds', 'Dominion: Guilds & Cornucopia', 'Dominion: Hinterlands', 'Dominion: Höflinge Promo Card', 'Dominion: Intrigue (Second Edition)', 'Dominion: Intrigue – Update Pack', 'Dominion: Menagerie', 'Dominion: Nocturne', 'Dominion: Prince Promo Card', 'Dominion: Prosperity', 'Dominion: Renaissance', 'Dominion: Sauna / Avanto Promo Card', 'Dominion: Seaside', 'Dominion: Stash Promo Card', 'Dominion: Summon Promo Card', 'Dominion: Update Pack', 'Dominion: Walled Village Promo Card', 'Duel (fan expansion for Dominion)', 'Fairy Tale (fan expansion for Dominion)', 'Paradox (fan expansion for Dominion)', ""Pirate's Life (fan expansion for Dominion)"", 'Royal Court (fan expansion for Dominion)', 'Salvation (fan expansion for Dominion)', 'The Scrolls of Power (fan expansion for Dominion)', 'Stadt Land Spielt Limitierte Sonderdrucke 2015', 'Stadt Land Spielt Minierweiterungen 2016', 'Warmonger (fan expansion for Dominion)']","['Dominion (Second Edition)', 'Het Koninkrijk Dominion']",['Donald X. Vaccarino'],"['Matthias Catrein', 'Julien Delval', 'Tomasz Jedruszek', 'Ryan Laukat', 'Harald Lieske', 'Michael Menzel', 'Marcel-André Casasola Merkle', 'Claus Stephan', 'Christof Tisch']","['Rio Grande Games', '999 Games', 'Albi', 'Bard Centrum Gier', 'Conclave Editora', 'cutia.ro', 'Devir', 'Filosofia Éditions', 'Games Factory Publishing', 'Gém Klub Kft.', 'Hans im Glück', 'Hobby Japan', 'Hobby World', 'Kaissa Chess & Games', 'Korea Boardgames Co., Ltd.', 'Lautapelit.fi', 'Martinex', 'Runadrake', 'Smart Ltd', 'Stupor Mundi', 'Swan Panasia Co., Ltd.', 'Vennerød Forlag AS', 'Ystari Games']",106956,2009,655,8621


In [7]:
mydb <- dbConnect(RSQLite::SQLite(), "")
dbWriteTable(mydb, "ratings", ratings)
dbWriteTable(mydb, "details", details)
dbListTables(mydb)

In [8]:
dbGetQuery(mydb, 'SELECT * FROM ratings LIMIT 5')

num,id,name,year,rank,average,bayes_average,users_rated,url,thumbnail
<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>
105,30549,Pandemic,2008,106,7.59,7.487,108975,/boardgame/30549/pandemic,https://cf.geekdo-images.com/S3ybV1LAp-8SnHIXLLjVqA__micro/img/S4tXI3Yo7BtqmBoKINLLVUFsaJ0=/fit-in/64x64/filters:strip_icc()/pic1534148.jpg
189,822,Carcassonne,2000,190,7.42,7.309,108738,/boardgame/822/carcassonne,https://cf.geekdo-images.com/okM0dq_bEXnbyQTOvHfwRA__micro/img/VfLoKzfk3xj26ArmDu55qZ4sysw=/fit-in/64x64/filters:strip_icc()/pic6544250.png
428,13,Catan,1995,429,7.14,6.97,108024,/boardgame/13/catan,https://cf.geekdo-images.com/W3Bsga_uLP9kO91gZ7H8yw__micro/img/LA4OvGfQ_TXQ-2mhaIFZp2ITWpc=/fit-in/64x64/filters:strip_icc()/pic2419375.jpg
72,68448,7 Wonders,2010,73,7.74,7.634,89982,/boardgame/68448/7-wonders,https://cf.geekdo-images.com/RvFVTEpnbb4NM7k0IF8V7A__micro/img/9glsOs7zoTbkVpfDt5SHWJm-kRA=/fit-in/64x64/filters:strip_icc()/pic860217.jpg
103,36218,Dominion,2008,104,7.61,7.499,81561,/boardgame/36218/dominion,https://cf.geekdo-images.com/j6iQpZ4XkemZP07HNCODBA__micro/img/PVxqHWOLTb3n-4xe62LJadr_M0I=/fit-in/64x64/filters:strip_icc()/pic394356.jpg


Now, let's get the data ready for further analysis.



In [None]:
# Read a CSV file into a tibble
df_students <- read_csv(file = "https://raw.githubusercontent.com/MicrosoftDocs/ml-basics/master/data/grades.csv")

# Remove any rows with missing data
df_students <- df_students %>% 
  drop_na()

# Add a column "Pass" that specifies if a student passed or failed
# Assuming '60' is the grade needed to pass
df_students <- df_students %>% 
  mutate(Pass = Grade >= 60)

# Print the results
df_students


Good job! That went well!

## **Visualizing data with ggplot2**

Data frames provide a great way to explore and analyze rectangular data, but sometimes plotting the data can greatly enhance your ability to analyze the data, see underlying trends and raise new questions.

`Ggplot2` is a package for creating elegant graphics for data analysis in R. Compared with other graphing systems, ggplot2 provides a flexible and intuitive way of creating graphs, by combining independent components of a graphic in a series of iterative steps. This allows you to create visualisations that match your specific needs rather than being limited to sets of predefined graphics.

Now let's see this in action. We'll start with a simple bar chart that shows the grade of each student.


In [None]:
ggplot(data = df_students) +
  geom_col(mapping = aes(x = Name, y = Grade))


Well, that worked; but the chart could use some improvements to make it clearer what we're looking at. We'll get to that. Let's first walk through the process of creating graphics in ggplot2.

You initialize a graphic using the function `ggplot()` and the data frame to use for the plot. `ggplot(data = df_students)` basically creates an empty graph which you can add layers to using a `+`.

`geom_col()` then adds a layer of bars whose height corresponds to the variables specified by the `mapping` argument. The mapping argument is always paired with `aes()`, which specifies how **variables in the data** are mapped (what goes into aes() are variables found in the data). In our case, we specified, map `Name` to the x axis and `Grade` to the y axis.

And that's it! We'll follow and extend this blueprint to make different types of graphs.

Now, let's improve the visual elements of the plot. For example, the following code:

-   Specifies the color of the bar chart.

-   Adds a title to the chart (so we know what it represents)

-   Adds labels to the X and Y (so we know which axis shows which data)


In [None]:
# Change the default grey background
theme_set(theme_light())


ggplot(data = df_students) +
  geom_col(mapping = aes(x = Name, y = Grade),
           # Specifiy color and transparency of the bars
           fill = "midnightblue", alpha = 0.7) +
  # Add a title to the chart
  ggtitle("Student Grades") +
  # Add labels to axes
  xlab("Student") +
  ylab("Grade")


Whooa! That's a step in the right direction. We can even improve this further using `ggplot2`'s comprehesive theming system. Themes are a powerful way to customize the non-data components of your plots: i.e. titles, labels, fonts, background, gridlines and legends. You can learn more about modifying components of a theme by running `?theme`.

For instance, let's:

-   Center the title

-   Add a grid (to make it easier to determine the values for the bars)

-   Rotate the X markers (so we can read them)


In [None]:
ggplot(data = df_students) +
  geom_col(mapping = aes(x = Name, y = Grade),
           fill = "midnightblue", alpha = 0.7) +
  ggtitle("Student Grades") +
  xlab("Student") +
  ylab("Grade") +
  theme(
    # Center the title
    plot.title = element_text(hjust = 0.5),
    
    # Add a grid (to make it easier to determine the bar values
    panel.grid = element_blank(),
    panel.grid.major.y = element_line(color = "gray", linetype = "dashed", size = 0.5),
    
    # Rotate the X markers so we can read them
    axis.text.x = element_text(angle = 90)
    
  )


Good job! Perhaps the bar chart would be more informative if the student names were in a certain order, right? This is a good chance to showcase how to use `dplyr` and `ggplot2` to derive insights from your data.

So, let's reorder the levels of the Name column in descending order, based on the Grade column and then plot this.


In [None]:
df_students %>% 
  mutate(Name = fct_reorder(Name, Grade, .desc = TRUE)) %>% 
  ggplot() +
  geom_col(mapping = aes(x = Name, y = Grade),
           fill = "midnightblue", alpha = 0.7) +
  ggtitle("Student Grades") +
  xlab("Student") +
  ylab("Grade") +
  theme(
    plot.title = element_text(hjust = 0.5),
    
    panel.grid = element_blank(),
    panel.grid.major.y = element_line(color = "gray", linetype = "dashed", size = 0.5),
    
    axis.text.x = element_text(angle = 90)
    
  )


That's a much better plot - both in aesthetics and information. For instance, we can quickly and easily discern how each student performed.

## Getting started with statistical analysis

Now that you know how to use R to manipulate and visualize data, you can start analyzing it 🎉.

A lot of data science is rooted in *statistics*, so we'll explore some basic statistical techniques.

> **Note**: This is not intended to teach you statistics - that's much too big a topic for this notebook. It will however introduce you to some statistical concepts and techniques that data scientists use as they explore data in preparation for machine learning modeling.

### Descriptive statistics and data distribution

When examining a *variable* (for example a sample of student grades), data scientists are particularly interested in its *distribution* (in other words, how are all the different grade values spread across the sample). The starting point for this exploration is often to visualize the data as a histogram, and see how frequently each value for the variable occurs.

So what `geom` are we going to use? " `geom_histogram`", you'll say - because you are already getting the gist/geom of it 🥳!


In [None]:
# Visualise distribution of the grades in a histogram
ggplot(data = df_students) +
  geom_histogram(mapping = aes(x = Grade))


Alright. This certainly tells us something about our data - for instance most of the grades seem to be around 50. However, `ggplot2` is extremely flexible and allows us to experiment with different function arguments to better reveal the story behind our data. By looking at `?geom_histogram` we can experiment with arguments such as `binwidth` and `boundary` as shown:



In [None]:
# Visualise distribution of the grades in a histogram
ggplot(data = df_students) +
  geom_histogram(mapping = aes(x = Grade), , binwidth = 20, boundary = 0.5, fill = "midnightblue", alpha = 0.7) +
  xlab('Grade') +
  ylab('Frequency') +
  theme(plot.title = element_text(hjust = 0.5))


Much better! The histogram for grades is a symmetric shape, where the most frequently occurring grades tend to be in the middle of the range (around 50), with fewer grades at the extreme ends of the scale.

### Measures of central tendency

To understand the distribution better, we can examine so-called *measures of central tendency*; which is a fancy way of describing statistics that represent the "middle" of the data. The goal of this is to try to find a "typical" value. Common ways to define the middle of the data include:

-   The *mean*: A simple average based on adding together all of the values in the sample set, and then dividing the total by the number of samples.

-   The *median*: The value in the middle of the range of all of the sample values.

-   The *mode*: The most commonly occuring value in the sample set^\*^.

Let's calculate these values, along with the minimum and maximum values for comparison, and show them on the histogram.

> Of course, in some sample sets , there may be a tie for the most common value - in which case the dataset is described as *bimodal* or even *multimodal*.

Base *R does not* provide a *function* for finding the *mode*. But worry not, `statip::mfv` returns the most frequent value(s) (or mode(s)) found in a vector. Other pretty awesome workarounds can be found on this [stackoverflow thread](https://stackoverflow.com/questions/2547402/how-to-find-the-statistical-mode).


In [None]:
# Load statip into the current R sesssion
library(statip)

# Get summary statistics
min_val <- min(df_students$Grade)
max_val <- max(df_students$Grade)
mean_val <- mean(df_students$Grade)
med_val <- median(df_students$Grade)
mod_val <- mfv(df_students$Grade)

# Print the stats
cat(
  "Minimum: ", round(min_val, 2),
   "\nMean: ", round(mean_val, 2),
   "\nMedian: ", round(med_val, 2),
   "\nMode: ", round(mod_val, 2),
   "\nMaximum: ", round(max_val, 2)
)


Now let's incorporate these statistics onto our graph.



In [None]:
# Plot a histogram
ggplot(data = df_students) +
  geom_histogram(mapping = aes(x = Grade), binwidth = 20, fill = "midnightblue", alpha = 0.7, boundary = 0.5) +
  
# Add lines for the statistics
  geom_vline(xintercept = min_val, color = 'gray33', linetype = "dashed", size = 1.3) +
  geom_vline(xintercept = mean_val, color = 'cyan', linetype = "dashed", size = 1.3) +
  geom_vline(xintercept = med_val, color = 'red', linetype = "dashed", size = 1.3 ) +
  geom_vline(xintercept = mod_val, color = 'yellow', linetype = "dashed", size = 1.3 ) +
  geom_vline(xintercept = max_val, color = 'gray33', linetype = "dashed", size = 1.3 ) +
  
# Add titles and labels
  ggtitle('Data Distribution')+
  xlab('Value')+
  ylab('Frequency')+
  theme(plot.title = element_text(hjust = 0.5))


> `geom_vline()` adds a vertical reference line to a plot.

Good job!

For the grade data, the mean, median, and mode all seem to be more or less in the middle of the minimum and maximum, at around 50.

Another way to visualize the distribution of a variable is to use a *box* plot (sometimes called a *box-and-whiskers* plot). Let's create one for the grade data.


In [None]:
# Plot a box plot
ggplot(data = df_students) +
  geom_boxplot(mapping = aes(x = 1, y = Grade), fill = "#E69F00", color = "gray23", alpha = 0.7) +
  
  # Add titles and labels
  ggtitle("Data Distribution") +
  xlab("") +
  ylab("Grade") +
  theme(plot.title = element_text(hjust = 0.5))


The box plot shows the distribution of the grade values in a different format to the histogram. The *box* part of the plot shows where the inner two *quartiles* of the data reside - so in this case, half of the grades are between approximately 36 and 63. The *whiskers* extending from the box show the outer two quartiles; so the other half of the grades in this case are between 0 and 36 or 63 and 100. The line in the box indicates the *median* value.

It's often useful to combine histograms and box plots, with the *box plot's orientation changed* to align it with the histogram (in some ways, it can be helpful to think of the histogram as a "front elevation" view of the distribution, and the box plot as a "plan" view of the distribution from above). Since we may need to plot the histograms and box plots for different variables, it will be convenient to write a function. Functions allows you to automate common tasks in a more powerful and general way than copy-and-pasting.

Let's get right to it! Functions in R are generally defined in this fashion:

`name <- function(variables) {return(value)}`

> [patchwork](https://patchwork.data-imaginist.com/) extends `ggplot` API by providing mathematical operators (such as `+` or `/`) for combining multiple plots. Yes, as easy as that!


In [None]:
library(patchwork)
# Create a function that we can reuse
show_distribution <- function(var_data, binwidth) {
  
  # Get summary statistics by first extracting values from the column
  min_val <- min(pull(var_data))
  max_val <- max(pull(var_data))
  mean_val <- mean(pull(var_data))
  med_val <- median(pull(var_data))
  mod_val <- statip::mfv(pull(var_data))

  # Print the stats
  stats <- glue::glue(
  'Minimum: {format(round(min_val, 2), nsmall = 2)}
   Mean: {format(round(mean_val, 2), nsmall = 2)}
   Median: {format(round(med_val, 2), nsmall = 2)}
   Mode: {format(round(mod_val, 2), nsmall = 2)}
   Maximum: {format(round(max_val, 2), nsmall = 2)}'
  )
  
  # Plot the histogram
  hist_gram <- ggplot(var_data) +
  geom_histogram(aes(x = pull(var_data)), binwidth = binwidth,
                 fill = "midnightblue", alpha = 0.7, boundary = 0.4) +
    
  # Add lines for the statistics
  geom_vline(xintercept = min_val, color = 'gray33', linetype = "dashed", size = 1.3) +
  geom_vline(xintercept = mean_val, color = 'cyan', linetype = "dashed", size = 1.3) +
  geom_vline(xintercept = med_val, color = 'red', linetype = "dashed", size = 1.3 ) +
  geom_vline(xintercept = mod_val, color = 'yellow', linetype = "dashed", size = 1.3 ) +
  geom_vline(xintercept = max_val, color = 'gray33', linetype = "dashed", size = 1.3 ) +
    
  # Add titles and labels
  ggtitle('Data Distribution') +
  xlab('')+
  ylab('Frequency') +
  theme(plot.title = element_text(hjust = 0.5))
  
  # Plot the box plot
  bx_plt <- ggplot(data = var_data) +
  geom_boxplot(mapping = aes(x = pull(var_data), y = 1),
               fill = "#E69F00", color = "gray23", alpha = 0.7) +
    
    # Add titles and labels
  xlab("Value") +
  ylab("") +
  theme(plot.title = element_text(hjust = 0.5))
  
  
  # To return multiple outputs, use a `list`
  return(
    
    list(stats,
         # Combine histogram and box plot using library patchwork
         hist_gram / bx_plt)
    
        ) # End of returned outputs
  
} # End of function


Now that the `show_distribution()` function is done for, let's get a variable/column to examine and then call the function.



In [None]:
# Get the variable to examine
col <- df_students %>% 
  select(Grade)

# Call the function
show_distribution(var_data = col, binwidth = 20)


All of the measurements of central tendency are right in the middle of the data distribution, which is symmetric with values becoming progressively lower in both directions from the middle.

To explore this distribution in more detail, you need to understand that statistics is fundamentally about taking *samples* of data and using probability functions to *extrapolate information* about the full *population* of data. For example, the student data consists of 22 samples, and for each sample there is a grade value. You can think of each sample grade as a variable that's been randomly selected from the set of all grades awarded for this course. With enough of these random variables, you can calculate something called a *probability density function*, which estimates the distribution of grades for the full population.

A density plot is a representation of the distribution of a numeric variable. It is a smoothed version of the histogram and is often used in the same kind of situation.

> `geom_density()` computes and draws a kernel density estimate, which is a smoothed version of the histogram.


In [None]:
# Create a function that returns a density plot
show_density <- function(var_data) {
  
  # Get statistics
  mean_val <- mean(pull(var_data))
  med_val <- median(pull(var_data))
  mod_val <- statip::mfv(pull(var_data))
  
  
  # Plot the density plot
  density_plot <- ggplot(data = var_data) +
  geom_density(aes(x = pull(var_data)), fill="orangered", color="white", alpha=0.4) +
    
  # Add lines for the statistics
  geom_vline(xintercept = mean_val, color = 'cyan', linetype = "dashed", size = 1.3) +
  geom_vline(xintercept = med_val, color = 'red', linetype = "dashed", size = 1.3 ) +
  geom_vline(xintercept = mod_val, color = 'yellow', linetype = "dashed", size = 1.3 ) +
    
  # Add titles and labels
  ggtitle('Data Density') +
  xlab('') +
  ylab('Density') +
  theme(plot.title = element_text(hjust = 0.5))
  
  
  
  return(density_plot) # End of returned outputs
  
} # End of function


# Get the density of Grade
col <- df_students %>% select(Grade)
show_density(var_data = col)


As expected from the histogram of the sample, the density shows the characteristic "bell curve" of what statisticians call a *normal* distribution with the mean and mode at the center and symmetric tails.

## **Summary**

Well done! There were a number of new concepts in here, so let's summarise.

Here we have:

1.  Made graphs with ggplot2

2.  Seen how to customise these graphs

3.  Calculated basic statistics, such as medians

4.  Looked at the spread of data using box plots and histograms

5.  Learned about samples vs populations

6.  Estimated what the population of graphs might look like from a sample of grades.

In our next notebook we will look at spotting unusual data, and finding relationships between data.

## **Further Reading**

To learn more about the R packages and concepts you explored in this notebook, see the following documentation:

-   [Tidyverse packages](https://www.tidyverse.org/packages/)

-   [Patchwork](https://patchwork.data-imaginist.com/)

-   [Functions with R](https://skirmer.github.io/presentations/functions_with_r.html#1)
