# Scrape Data

The purpose of this notebook is to extract records from the raw data into a workable unified format.

It is sorta Named Entity Resolution (NER), but I'm not sure if it academically neatly falls into this category as it is more of a rudimentary preprocessing step, more similar to document intelligence.

No deduplication / entity resolution is performed within this step. See Notebook 4 for deduplication efforts.

## Load raw data into memory

Load the files using f# as I'm not sure how to load files in javascript

NOTE: Potential improvement is to not store all the files content in memory at once. However, that's a problem to be bridged when it becomes a problem.

In [2]:
open System.IO

let directory = "../data/raw"
let searchPattern = "*.md"

let markdownFiles =
    Directory.EnumerateFiles(directory, searchPattern, SearchOption.AllDirectories)
    // Excluding licenses as they are not part of the data
    |> Seq.filter (fun filePath -> not (Path.GetFileName(filePath).ToLowerInvariant() = "license.md"))

let fileDict = new System.Collections.Generic.Dictionary<string, string>()
for filePath in markdownFiles do
    let fileContent = File.ReadAllText(filePath)
    fileDict.[Path.GetFileName(filePath)] <- fileContent

fileDict

key,value
awesome-fsharp.md,"# <img src=""http://fsprojects.github.io/assets/logo.png"" width=""26""> Awesome F# # [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) [![Join the chat at https://gitter.im/VPashkov/awesome-fsharp](https://badges.gitter.im/VPashkov/awesome-fsharp.svg)](https://gitter.im/VPashkov/awesome-fsharp?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) A curated list of awesome F# frameworks, libraries, software and resources. ## Table of Contents - [Awesome F#](#)  - [F# wrappers for popular .NET libraries](#f-wrappers-for-popular-net-libraries)  - [Actor frameworks](#actor-frameworks)  - [Build tools](#build-tools)  - [Cloud](#cloud)  - [Compilers](#compilers)  - [Concurrent, asynchronous and parallel programming](#concurrent-asynchronous-and-parallel-programming)  - [Configuration](#configuration)  - [Data Science](#data-science)  - [Development Tools](#development-tools)  - [IDE](#ide)  - [Editor plugins](#editor-plugins)  - [Performance analysis](#performance-analysis)  - [Game development](#game-development)  - [General purpose libraries](#general-purpose-libraries)  - [GUI](#gui)  - [HTTP Clients](#http-clients)  - [Logging](#logging)  - [Package Management](#package-management)  - [Parsing](#parsing)  - [PreCompilation](#precompilation)  - [Search](#search)  - [Serialization](#serialization)  - [Simulation](#simulation)  - [Testing](#testing)  - [Type providers](#type-providers)  - [Creating type providers](#creating-type-providers)  - [Visualization](#visualization)  - [Web frameworks](#web-frameworks)  - [.Net Core Templates](#net-core-templates)  - [Resources](#resources)  - [Blogs](#blogs)  - [Cheatsheets](#cheatsheets)  - [Community](#community)  - [Other Lists](#other-lists)  - [Websites](#websites)  - [Videos](#videos)  - [Courses](#courses) ## F# wrappers for popular .NET libraries Looking to have a more enjoyable experience when consuming a popular .NET library? Here is a quick table. .NET Library | F# Wrapper -|- [Avalonia](https://github.com/AvaloniaUI/Avalonia) | [Avalonia.FuncUI](https://github.com/fsprojects/Avalonia.FuncUI) [ASP.NET Core](https://github.com/dotnet/aspnetcore) | [Giraffe](https://github.com/giraffe-fsharp/Giraffe) (+ optionally [Saturn](https://github.com/SaturnFramework/Saturn)) [ASP.NET Core Blazor](https://github.com/dotnet/aspnetcore/tree/main/src/Components) | [Bolero](https://github.com/fsbolero/Bolero) [MSTest](https://github.com/microsoft/testfx)/[NUnit](https://github.com/nunit/nunit)/[xUnit.net](https://github.com/xunit/xunit) | [FsUnit](https://github.com/fsprojects/FsUnit) [System.Text.Json](https://github.com/dotnet/runtime/tree/main/src/libraries/System.Text.Json) | [FSharp.SystemTextJson](https://github.com/Tarmil/FSharp.SystemTextJson) [WPF](https://github.com/dotnet/wpf) | [Elmish.WPF](https://github.com/elmish/Elmish.WPF) [Xamarin.Forms](https://github.com/xamarin/Xamarin.Forms) | [Fabulous](https://github.com/fabulous-dev/Fabulous) ## Actor frameworks * **[Akka.NET ★ 2239 ⧗ 0](https://github.com/akkadotnet/akka.net)** - Community-driven port of the popular Java/Scala framework Akka to .NET. [Apache 2.0] * [Akkling ★ 45 ⧗ 1](https://github.com/Horusiath/Akkling) - F# typed API for Akka.NET. [Apache 2.0] * [Cricket ★ 141 ⧗ 380](https://github.com/fsprojects/Cricket) - Actor framework for F#. [Unlicense] * [Orleankka ★ 175 ⧗ 5](https://github.com/OrleansContrib/Orleankka) - Functional API for Orleans Framework. [Apache 2.0] * **[Orleans ★ 2754 ⧗ 0](https://github.com/dotnet/orleans)** - Distributed Virtual Actor Model. [MIT] * **[Proto.actor ★ 692 ⧗ 0](https://github.com/AsynkronIT/protoactor-dotnet)** - Cross-platform actor framework for .NET, GO, JAVA and KOTLIN. [Apache 2.0] ## Build tools * **[FAKE ★ 733 ⧗ 0](https://github.com/fsharp/FAKE)** - ""F# Make"" is a cross platform build automation system. [Apache 2.0] * **[Xake ★ 8 ⧗ 0](https://github.com/OlegZee/Xake)** - Another MAKE utility implementation on F#, fully declarative with no-brain parallelism, inspired by Shake. [MIT] ## Cloud * [FsFirestore](https://github.com/mrbandler/FsFirestore) - Functional F# library to access Firestore database hosted on Google Cloud Platform (GCP) or Firebase. [MIT] * [Chia ★ 3 ⧗ 0](https://github.com/DanpowerGruppe/Chia) - Chia is a F# library which contains HelperFunctions for reporting, logging and Azure cloud operations. [Apache-2.0] ## Code Generation * [Hawaii](https://github.com/Zaid-Ajaj/Hawaii) - A dotnet CLI tool to generate type-safe F# clients from OpenAPI/Swagger services. ## Compilers * [F# Compiler Services ★ 159 ⧗ 0](https://github.com/fsharp/FSharp.Compiler.Service) - The F# Compiler, F# Interactive scripting engine and F# editing services as a component library. [Apache 2.0] * **[Fable ★ 808 ⧗ 0](https://github.com/fable-compiler/Fable)** - F# to JavaScript Compiler. [Apache 2.0] * [Fez ★ 49 ⧗ 0](https://github.com/kjnilsson/fez) - F# to Erlang compiler. [MIT] * **[FSharp ★ 1549 ⧗ 0](https://github.com/fsharp/fsharp)** - The Open Edition of the F# compiler, core library and tools. [Apache 2.0] * [FunScript ★ 446 ⧗ 64](https://github.com/ZachBray/FunScript) - F# to JavaScript compiler with JQuery etc. mappings through a TypeScript type provider. [Apache-2.0] * [Juniper ★ 73 ⧗ 0](https://github.com/calebh/Juniper) - Functional Reactive Programming for the Arduino and other microcontrollers. [MIT] * [Pengines.Client ★ 3 ⧗ 0](https://github.com/ninjarobot/Pengines.Client) - sandboxed Prolog environment. [BSD-2-Clause] * **[Visual F# ★ 988 ⧗ 0](https://github.com/Microsoft/visualfsharp)** - The Visual F# compiler and tools. [Apache 2.0] ## Concurrent, asynchronous and parallel programming * [FIO](https://github.com/iyyel/fio) - A type-safe, highly concurrent and asynchronous library for F# based on pure functional programming [GNU v3] * [FSharp.Control.AsyncSeq ★ 28 ⧗ 12](https://github.com/fsprojects/FSharp.Control.AsyncSeq) - Collection of asynchronous programming utilities for F#. [Apache 2.0] * [FSharp.Control.FusionTasks](https://github.com/kekyo/FSharp.Control.FusionTasks) - F# Async workflow <--> .NET Task/ValueTask easy seamless interoperability library. * [FSharpx.Async ★ 37 ⧗ 56](https://github.com/fsprojects/FSharpx.Async) - Collection of asynchronous programming utilities for F#. [Apache 2.0] * [Giraffe.Tasks ★ 13 ⧗ 0](https://github.com/giraffe-fsharp/giraffe.tasks) - task computation expression to work natively with .NET's Tasks from an F# application. [Apache 2.0] * [Hopac ★ 268 ⧗ 7](https://github.com/Hopac/Hopac) - Concurrent ML style concurrent programming library for F#. [MIT] * [Ply](https://github.com/crowded/ply) - High performance System.Threading.(Value)Task computation expressions for F#. [MIT] * [Reaction.AsyncRx](https://github.com/dbrattli/Reaction) - An implementation of Async Observables in F# for .NET and Fable. [MIT] * [TaskBuilder.fs](https://github.com/rspeele/TaskBuilder.fs) - F# computation expression builder for System.Threading.Tasks. [CC0] ## Configuration * [Argu ★ 145 ⧗ 0](https://github.com/fsprojects/Argu) - Declarative CLI argument/XML configuration parser for F# applications. [MIT] * [docopt.fs ★ 18 ⧗ 0](https://github.com/docopt/docopt.fs/) - command line arguments parser, F# port of [docopt](https://github.com/docopt/docopt). [MIT] * [FsConfig ★ 14 ⧗ 1](https://github.com/demystifyfp/FsConfig) - F# library for reading configuration data from environment variables and AppSettings with type safety. [Unlicense] * [Skid ★ 3 ⧗ 0](https://github.com/Meyhem/Skid) - Simple, single-file portable CLI utility for configuration templating. [MIT] ## Data Science * [Deedle ★ 347 ⧗ 21](https://github.com/BlueMountainCapital/Deedle) - Deedle: Exploratory data library for .NET. [BSD-2-Clause] * [Deep.Net](http://www.deepml.net) - Deep learning library for F#. Provides symbolic model differentiation, automatic differentiation and compilation to CUDA GPUs. [Apache 2.0] * [DiffSharp ★ 106 ⧗ 70](https://github.com/DiffSharp/DiffSharp) - DiffSharp is a functional automatic differentiation (AD) library. [BSD-2-Clause] * [FsLab ★ 97 ⧗ 171](https://github.com/fslaborg/FsLab) - FsLab is a collection of libraries for data-science. It provides a rapid development environment that lets you write advanced analysis with few lines of production-quality code. [Apache 2.0] * [IfSharp * 272 ⧗ 1](https://github.com/fsprojects/IfSharp) - F# for Jupyter Notebooks. [BSD-3-Clause] * [m2cgen](https://github.com/BayesWitnesses/m2cgen) - A CLI tool to transpile trained classic ML models into a native F# code with zero dependencies. [MIT] * **[Math.NET Numerics ★ 1,923 ⧗ 0](https://github.com/mathnet/mathnet-numerics)** - Math.NET Numerics aims to provide methods and algorithms for numerical computations in science, engineering and every day use. F# specific bindings available. [MIT] * [Math.NET Symbolics ★ 203 ⧗ 5](https://github.com/mathnet/mathnet-symbolics/) - Math.NET Symbolics is a basic open source computer algebra library for .NET, Silverlight and Mono written entirely in F#. [MIT] * [SIMDArray ★ 42 ⧗ 11](https://github.com/jackmott/SIMDArray) - SIMD enhanced Array extensions for faster computation. [MIT] * [Synapses](https://github.com/mrdimosthenis/Synapses) - Neural network library in F#. [MIT] ## Development Tools ### IDE * [F# Playground](https://github.com/Seng-Jik/FSharpPlayground) - Minimal playground for F#. [GPL 3.0] * [Jetbrains Rider](https://www.jetbrains.com/rider) - Cross-Platform .Net IDE with F# support. [Proprietary, free for open source projects] * [MonoDevelop](http://www.monodevelop.com/) - Cross-platform IDE mostly aimed at Mono/.NET developers. [LGPLv2 and X11/MIT] * [Visual Studio](https://www.visualstudio.com/) - IDE from Microsoft with first class F# support(Windows only). [Proprietary] ### Editor plugins * [Emacs F# mode ★ 80 ⧗ 27](https://github.com/fsharp/emacs-fsharp-mode) - F# support in Emacs (including Intellisense and Interactive mode) [Apache 2.0] * [F# Bindings ★ 321 ⧗ 261](https://github.com/fsharp/fsharpbinding) - Archive of F# Language Bindings for Open Editors. [Apache 2.0] * [Fantomas ★ 472 ⧗ 115](https://github.com/fsprojects/fantomas) - F# code formatter. [Apache 2.0] * [FSharpLint ★ 223 ⧗ 55](https://github.com/fsprojects/FSharpLint) - F# code linter. [MIT] * [FSharpFar ★ 33 ⧗ 54](https://github.com/nightroman/FarNet) - F# support for Far Manager. [BSD-3-Clause] * [Ionide](http://ionide.io/) - Atom Editor and Visual Studio Code package suite for cross platform F# development. [MIT] * [Vim F# ★ 66 ⧗ 3](https://github.com/fsharp/vim-fsharp) - F# support for Vim. [MIT] * [neofsharp.vim](https://github.com/adelarsq/neofsharp.vim) - Basic F# support for (Neo)Vim [MIT] * [VimSpeak ★ 305 ⧗ 910](https://github.com/AshleyF/VimSpeak) - VimSpeak lets you control Vim with your voice using speech recognition. [MIT] * [Visual F# Power Tools ★ 310 ⧗ 53](https://github.com/fsprojects/VisualFSharpPowerTools) - Power commands for F# in Visual Studio. [Apache 2.0] * [fsharp-notebook](https://github.com/pablofrommars/fsharp-notebook) - Data Science Notebook for F# interactive. [MIT] ### Performance analysis * [fasm](https://github.com/d-edge/fasm) - F# jit disassembler, as a dotnet tool [MIT] ## General purpose libraries * [Aether ★ 71 ⧗ 0](https://github.com/xyncro/aether) - Optics library for F#, similar to the Haskell Data.Lens package. [MIT] * [Chessie ★ 96 ⧗ 272](https://github.com/fsprojects/Chessie) - Railway-oriented programming. [Unlicense] * [Donald](https://github.com/pimbrouwers/Donald) - A simple F# interface for ADO.NET. [Apache-2.0] * [DustyTables ★ 39 ⧗ 6](https://github.com/Zaid-Ajaj/DustyTables) - Thin F# API for SqlClient for easy data access to ms sql server with functional seasoning on top [MIT] * [ExtCore ★ 96 ⧗ 0](https://github.com/jack-pappas/ExtCore) - Extended core library for F#. [Apache 2.0] * [Fling](https://github.com/cmeeren/Fling) - Fling significantly reduces boilerplate needed to efficiently save/load complex domain entities to/from multiple tables. [MIT] * [FSharp.CosmosDb](https://github.com/aaronpowell/fsharp.cosmosdb) - An F# wrapper around the CosmosDB SDK, making it more functional-friendly [MIT] * [FSharp.HashCollections ★ 4 ⧗ 0](https://github.com/mvkara/fsharp-hashcollections) - Library providing fast hash based immutable map and set. [MIT] * [FSharpLu ★ 133 ⧗ 20](https://github.com/Microsoft/fsharplu) - Lightweight utilities for string manipulations, logging, collection data structures, file operations, text processing, security, async, parsing, diagnostics, configuration files and Json serialization. [MIT] * [FsToolkit.ErrorHandling](https://github.com/demystifyfp/FsToolkit.ErrorHandling) - Clear, simple and powerful error handling with railway-oriented programming. Inspired by Chessie. [MIT] * [Fumble ★ 30 ⧗ 0](https://github.com/tforkmann/Fumble) - Thin F# API for Sqlite for easy data access to sqlite database with functional seasoning on top [MIT] * [FSharpPlus ★ 142 ⧗ 34](https://github.com/gmpl/FSharpPlus) - Extensions for F#. [Apache 2.0] * [FSharpx.Extras ★ 589 ⧗ 28](https://github.com/fsprojects/FSharpx.Extras) - FSharpx.Extras is a collection of libraries and tools for use with F#. [Unlicense] * [LiteDB.FSharp](https://github.com/Zaid-Ajaj/LiteDB.FSharp) - F# Support for [LiteDB](https://github.com/mbdavid/LiteDB), an embedded single file database for .NET [MIT] * [Npgsql.FSharp](https://github.com/Zaid-Ajaj/Npgsql.FSharp) - Thin F# wrapper around [Npgsql](https://github.com/npgsql/npgsql), the PostgreSQL database driver [MIT] * [TypeShape ★ 64 ⧗ 0](https://github.com/eiriktsarpalis/TypeShape) - Small, extensible F# library for practical generic programming. [MIT] * [Validus](https://github.com/pimbrouwers/Validus) - A composable validation library for F#, with built-in validators for most primitive types and easily extended through custom validators. * [Vp.FSharp.Sql](https://github.com/veepee-oss/Vp.FSharp.Sql) - Generic F# ADO Provider Wrapper (SqlServer, PostgreSql, Sqlite). [MIT] ## Game development * [FsUnity](https://github.com/FsUnity) - F# Libraries, Tools, and Plugins for the Unity3d Game Engine. [Unilicense] * [Garnet ★ 15 ⧗ 6](https://github.com/bcarruthers/garnet) - Garnet is a lightweight game composition library for F# with entity-component-system (ECS) and actor-like messaging features. [MIT] * [Godot](http://www.lkokemohr.de/fsharp_godot.html) - Tutorial how to use F# with Godot. * **[Nu Game Engine ★ 502 ⧗ 9](https://github.com/bryanedds/Nu)** - Cross-platform F# 2D game engine built in the functional style. Uses SDL2 and Farseer Physics. [MIT] ## GUI * [Avalonia.FuncUI](https://github.com/fsprojects/Avalonia.FuncUI) - Develop cross-platform MVU GUI Applications using F# and Avalonia * [Epoxy](https://github.com/kekyo/epoxy) - An independent flexible XAML MVVM library for .NET * [Fabulous](https://github.com/fabulous-dev/Fabulous) - F# Functional App Development, using declarative dynamic UI ## HTTP Clients * [Http.fs](https://github.com/haf/Http.fs) - A simple, functional HTTP client library for F# * [FsHttp](https://github.com/ronaldschlenker/FsHttp) - A convenient library for consuming HTTP/REST endpoints via F#. [Apache 2.0] * [Oryx](https://github.com/cognitedata/oryx) - A high performance .NET cross platform functional HTTP request handler library for writing HTTP clients and orchestrating web requests. [Apache 2.0] ## Logging * [FsLibLog ★ 26 ⧗ 0](https://github.com/TheAngryByrd/FsLibLog) - FsLibLog is a single file you can copy paste or add through Paket Github dependencies to provide your F# library with a logging abstraction. [MIT] * [Logary ★ 259 ⧗ 0](https://github.com/logary/logary/) - Logary is a high performance, multi-target logging, metric, tracing and health-check library for mono and .Net. [Apache 2.0] ## Package Management * [NuGet](https://www.nuget.org/) - NuGet is the package manager for the Microsoft development platform including .NET. [Apache 2.0] * **[Paket ★ 903 ⧗ 0](https://github.com/fsprojects/Paket)** - Dependency manager for .NET with support for NuGet packages and Git repositories. [MIT] ## Parsing * [FParsec ★ 50 ⧗ 0](https://github.com/stephan-tolksdorf/fparsec) - FParsec is a parser combinator library for F#. [[BSD-2-Clause](http://www.quanttec.com/fparsec/license.html)] * [FsAttoparsec ★ 1 ⧗ 0](https://github.com/haf/FsAttoparsec) - Port of Bryan O'Sullivan's attoparsec from Haskell to F#. [BSD-3-Clause] * [XParsec ★ 29 ⧗ 2](https://github.com/corsis/XParsec) - Extensible, type-and-source-polymorphic, non-linear applicative parser combinator library for F# 3.0 and 4.0. [BSD-3-Clause] ## PreCompilation * [Myriad ★ 38 ⧗ 4](https://github.com/MoiraeSoftware/myriad) - Myriad is a pre-compilation code generator ## Serialization * [FsCodec ★ 21 ⧗ 7](https://github.com/jet/FsCodec) - F# Event-Union Contract Encoding with versioning tolerant converters. [Apache 2.0] * [FSharp.Json ★ 72 ⧗ 15](https://github.com/vsapronov/FSharp.Json) - F# JSON Reflection based serialization library. [Apache-2.0] * [FSharp.SystemTextJson ★ 36 ⧗ 0](https://github.com/Tarmil/FSharp.SystemTextJson) - System.Text.Json extensions for F# types. [MIT] * [Fleece ★ 94 ⧗ 76](https://github.com/mausch/Fleece) - Fleece is a JSON mapper for F#. It simplifies mapping from a Json library's JsonValue onto your types, and mapping from your types onto JsonValue. [Apache-2.0] * [FsPickler ★ 195 ⧗ 13](https://github.com/mbraceproject/FsPickler) - Fast, multi-format messaging serializer for .NET. [MIT] * [Legivel ★ 19 ⧗ 4](https://github.com/fjoppe/Legivel) - F# Yaml 1.2 parser. [Unlicense] * [Thoth.Json ★ 40 ⧗ 11](https://thoth-org.github.io/Thoth.Json/) - Json encoder/decoder library inspire by Elm. [MIT] ## Search * [FlexSearch ★ 133 ⧗ 14](https://github.com/flexsearch/flexsearch) - high performance REST/SOAP services based full-text searching platform built on top of the popular Lucene search library. [Apache 2.0] ## Simulation * [F# RISC-V Instruction Set formal specification](https://github.com/mrLSD/riscv-fs) - RISC-V CPU formal ISA Specification. RISC-V CPU simulator with ELF files execution. [MIT] ## Testing * [altcover ★ 139 ⧗ 0](https://github.com/SteveGilham/altcover) - Cross-platform coverage gathering and processing tool set for .net/.net core and Mono. [MIT] * [canopy ★ 304 ⧗ 2](https://github.com/lefthandedgoat/canopy) - F# web automation and testing framework. [MIT] * [Expecto ★ 124 ⧗ 2](https://github.com/haf/expecto) - Smooth testing framework for F# with tests-as-values and parallelism by default. [Apache 2.0] * [FsCheck ★ 415 ⧗ 34](https://github.com/fscheck/FsCheck) - Random Testing for .NET. [BSD-3-Clause] * [fsharp-hedgehog ★ 42 ⧗ 4](https://github.com/hedgehogqa/fsharp-hedgehog) - Property-based testing system for F#. [Apache 2.0] * [FsUnit ★ 340 ⧗ 86](https://github.com/fsprojects/FsUnit) - FsUnit makes unit-testing with F# more enjoyable. It adds a special syntax to your favorite .NET testing framework. [MIT] * [NBomber ★ 14 ⧗ 23](https://github.com/PragmaticFlow/NBomber) - simple load testing framework for Pull and Push scenarios. [Apache 2.0] * [Persimmon ★ 29 ⧗ 9](https://github.com/persimmon-projects/Persimmon) - Unit test framework for F# using computation expressions. [MIT] * [unquote ★ 88 ⧗ 17](https://github.com/swensensoftware/unquote) - Write F# unit test assertions as quoted expressions. [Apache 2.0] * [xUnit.net](https://xunit.github.io/) - Free, open source, community-focused unit testing tool for the .NET Framework. [Apache 2.0] ## Type providers * [ApiaryProvider ★ 9 ⧗ 380](https://github.com/fsprojects/ApiaryProvider) - Type provider for Apiary.io. [Apache 2.0] * [AzureStorageTypeProvider ★ 45 ⧗ 20](https://github.com/fsprojects/AzureStorageTypeProvider) - F# Azure Type Provider which can be used to explore Blob, Table and Queue Azure Storage assets and easily apply CRUD operations on them. [Unilicense] * [COM Type Provider ★ 36 ⧗ 330](https://github.com/fsprojects/FSharp.Interop.ComProvider) - Type provider for COM interop. [Unilicense] * [DynamicsCRMProvider ★ 8 ⧗ 380](https://github.com/fsprojects/DynamicsCRMProvider) - Type provider for Microsoft Dynamics CRM 2011. [Apache 2.0] * [ExcelProvider ★ 45 ⧗ 75](https://github.com/fsprojects/ExcelProvider) - Excel type provider. [Unilicense] * [Facil](https://github.com/cmeeren/Facil) - Facil generates F# data access source code from SQL queries and stored procedures. Optimized for developer happiness. [MIT] * [FSharp.Configuration ★ 60 ⧗ 6](https://github.com/fsprojects/FSharp.Configuration) - The project contains type providers for the configuration of .NET projects. Handles AppSettings, ResX, Yaml and Ini files. [Apache 2.0] * [FSharp.Data ★ 375 ⧗ 8](https://github.com/fsharp/FSharp.Data) - Data science library that contains type providers for CSV, HTML, JSON, XML, and WorldBank data. [Apache 2.0] * [FSharp.Data.DbPedia ★ 9 ⧗ 379](https://github.com/fsprojects/FSharp.Data.DbPedia) - F# type provider for DBpedia. [Unilicense] * [FSharp.Data.HiveProvider ★ 8 ⧗ 379](https://github.com/fsprojects/FSharp.Data.HiveProvider) - Demonstrator F# type provider for Apache Hive. [Apache 2.0] * [FSharp.Data.Npgsql ★ 6 ⧗ 1](https://github.com/demetrixbio/FSharp.Data.Npgsql) - F# type providers library on a top of well-known Npgsql ADO.NET client library. [Apache 2.0] * [FSharp.Data.SqlClient ★ 121 ⧗ 16](https://github.com/fsprojects/FSharp.Data.SqlClient) - F# Type Provider for statically typed access to T-SQL command parameters and result set. [Apache 2.0] * [FSharp.Data.Tdms ★ 0 ⧗ 1](https://github.com/mettekou/FSharp.Data.Tdms) - TDMS support for F# [MIT] * [FSharp.Data.Toolbox ★ 38 ⧗ 7](https://github.com/fsprojects/FSharp.Data.Toolbox) - Library for various data access APIs based on FSharp.Data. The library currently includes the Twitter type provider for access to Twitter users and feeds, and SAS type provider to read SAS dataset files. [Apache 2.0] * [FSharp.Data.TypeProviders ★ 9 ⧗ 379](https://github.com/fsprojects/FSharp.Data.TypeProviders) - Library that contains type providers for `.edmx` files, `.dbml` files, WSDL services, OData services, and SQL databases. [Unilicense] * [FSharp.Management ★ 59 ⧗ 1](https://github.com/fsprojects/FSharp.Management) - The project contains various type providers for the management of the machine. Handles file system, registry, Windows Management Instrumentation, PowerShell and SystemTimeZones. [Apache 2.0] * [FSharp.Text.RegexProvider ★ 29 ⧗ 285](https://github.com/fsprojects/FSharp.Text.RegexProvider) - Type provider for regular expressions. [Apache 2.0] * [FsXaml ★ 158 ⧗ 453](https://github.com/fsprojects/FsXaml) - F# Tools for working with XAML Projects. [MIT] * [FsYaml ★ 33 ⧗ 41](https://github.com/bleis-tift/FsYaml) - Typed Yaml library for F#. [NYSL Version 0.9982] * [GraphProvider ★ 21 ⧗ 379](https://github.com/fsprojects/GraphProvider) - `.dgml` state machine type provider. [Apache 2.0] * [MatDataProvider ★ 6 ⧗ 378](https://github.com/fsprojects/matprovider) - Erased type provider for `.mat` files (binary MATLAB format files). [Apache 2.0] * [R Type Provider ★ 159 ⧗ 365](https://github.com/BlueMountainCapital/FSharpRProvider) - Type provider to interop with R. [BSD-2-Clause] * [Rezoom.SQL ★ 7 ⧗ 0](https://github.com/rspeele/Rezoom.SQL) - Statically typed SQL for F#. [MIT] * [S3Provider ★ 16 ⧗ 379](https://github.com/fsprojects/S3Provider) - Experimental type provider for Amazon S3. [MIT] * [SQLProvider ★ 192 ⧗ 7](https://github.com/fsprojects/SQLProvider) - General F# SQL database erasing type provider, supporting LINQ queries, schema exploration, individuals, CRUD operations and much more besides. [Apache 2.0] * [SwaggerProvider ★ 81 ⧗ 3](https://github.com/fsprojects/SwaggerProvider) - F# generative Type Provider for Swagger. [Unilicense] ### Creating type providers * [FSharp.TypeProviders.StarterPack ★ 104 ⧗ 42](https://github.com/fsprojects/FSharp.TypeProviders.StarterPack) - The ProvidedTypes SDK for creating F# type providers. [Apache 2.0] * [RestProvider ★ 14 ⧗ 258](https://github.com/fsprojects/RestProvider) - Create type providers just by implementing a simple REST service. [Apache 2.0] ## Visualization * [FSharp.Charting ★ 186 ⧗ 0](https://github.com/fslaborg/FSharp.Charting) - Charting library suitable for interactive F# scripting. [MIT] * [SharpVG ★ 32 ⧗ 0](https://github.com/ChrisNikkel/SharpVG) - Create SVG vector graphics in F#. [MIT] * [XPlot ★ 173 ⧗ 0](https://github.com/fslaborg/XPlot) - A plotting library for the F# programming language. [Apache 2.0] * [GG.Net](https://github.com/pablofrommars/GGNet) - Visualization library for data scientists. [MIT] * [Plotly.NET](https://github.com/plotly/Plotly.NET) - A Plotly-based general purpose plotting library for F#. [MIT] ## Web frameworks * [Bolero ★ 629](https://github.com/fsbolero/Bolero/) - F# in WebAssembly, develop SPAs with the full power of F# and .NET Blazor. [Apache 2.0] * [Falco](https://github.com/pimbrouwers/Falco/) - A functional-first toolkit for building brilliant ASP.NET Core applications using F#. * [Felicity](https://github.com/cmeeren/Felicity) - Boilerplate-free, idiomatic JSON:API for your beautiful, idiomatic F# domain model. Optimized for developer happiness. [MIT] * [Freya ★ 241 ⧗ 7](https://github.com/xyncro/freya) - Modern, purely functional stack for web programming in F#. [Apache 2.0] * [Genit ★ 62 ⧗ 1](https://github.com/lefthandedgoat/genit) - Cross-platform website generator and server using F#, Suave and PostgreSQL or MS SQL Server. * [Giraffe ★ 526 ⧗ 49](https://github.com/giraffe-fsharp/Giraffe) - Native functional ASP.NET Core web framework for F# developers. [Apache 2.0] * [Saturn ★ 62 ⧗ 2](https://github.com/SaturnFramework/Saturn) - Opinionated, web development framework for F# which implements the server-side, functional MVC pattern. [MIT] * **[Suave ★ 756 ⧗ 2](https://github.com/SuaveIO/suave)** - Suave is a simple web development F# library providing a lightweight web server and a set of combinators to manipulate route flow and task composition. [Apache 2.0] * [WebSharper ★ 270 ⧗ 7](https://github.com/intellifactory/websharper) - F#-based web programming platform including a compiler from F# code to JavaScript. [Apache 2.0] ## .Net Core Templates  * [ASP.NET Core WebAPI F# Template](https://github.com/MNie/FSharpNetCoreWebApiTemplate) `dotnet new -i WebAPI.FSharp.Template::*`  * [Expecto Template](https://github.com/MNie/Expecto.Template) `dotnet new -i Expecto.Template::*`  * [Fable, F# |> Babel](http://fable.io) `dotnet new -i Fable.Template::*`  * [Fable-elmish](https://github.com/fable-compiler/fable-elmish) `dotnet new -i Fable.Template.Elmish.React::*`  * [Freya](https://freya.io) `dotnet new --install Freya.Template::*`  * [Giraffe Template](https://github.com/giraffe-fsharp/giraffe-template) (Quick install: `dotnet new -i ""giraffe-template::*""`)  * [MiniScaffold](https://github.com/TheAngryByrd/MiniScaffold) - F# Template for creating and publishing libraries targeting .NET Full (net45) and Core (netstandard1.6)  - `dotnet new -i MiniScaffold::*`  * [NancyFx Template](https://github.com/MNie/NancyFxCore) `dotnet new -i NancyFx.Core.Template::*`  * [SAFE Stack Template](https://github.com/SAFE-Stack/SAFE-template) `dotnet new -i SAFE.Template::*`  * [vbfox's F# Templates](https://github.com/vbfox/FSharpTemplates)  - F# Template for creating github project with appveyor and travis support ## Resources ### Blogs * [.NET Blog (F# tag)](https://devblogs.microsoft.com/dotnet/tag/f/) * [Codesuji](http://codesuji.com) * [Krzysztof Cieslak](http://kcieslak.io/) * [Mark Seemann](http://blog.ploeh.dk/) * [Sergey Tihon (F# Weekly)](https://sergeytihon.wordpress.com/) * [Tomas Petricek](http://tomasp.net/blog/) ### Books * [F# in Action](https://www.manning.com/books/f-sharp-in-action) ### Cheatsheets * [F# cheatsheet](http://fsprojects.github.io/fsharp-cheatsheet/) * [F# Snips](http://fssnip.net/) * [F# tour](https://docs.microsoft.com/en-us/dotnet/articles/fsharp/tour) * [Learn F# in Y minutes](https://learnxinyminutes.com/docs/fsharp) ### Community * [F# on Discourse](https://forums.fsharp.org/) * [F# on Github](https://github.com/fsharp/) * [F# on IRC](http://webchat.freenode.net/?channels=%23%23fsharp) * [F# on Slack](http://fsharp.org/guides/slack/) * [F# news on Telegram](https://t.me/fsharp_news) ### Other Lists * **[Awesome .NET! ★ 4458 ⧗ 2](https://github.com/quozd/awesome-dotnet)** - Collection of awesome .NET libraries, tools, frameworks and software. [CC0 1.0] * [Awesome Fable](https://github.com/kunjee17/awesome-fable) - Curated list of useful Fable tutorials, libraries and software. [CC0 1.0] * [F# Community Projects](http://fsharp.org/community/projects/) - FSharp community projects * [F# for fun and profit](https://fsharpforfunandprofit.com/) * [WTF#](https://wtfsharp.net) - podcast focused on F# and its ecosystem ### Websites * [Community for F#](http://c4fsharp.net/) - Links to dojos and recordings of community presentations. * [cs2fs](https://jindraivanek.gitlab.io/cs2fs-online) - Transform C# code to F# code * [Decompiler.com](https://www.decompiler.com/) - Online C#/VB/F# decompiler * [DotNetFiddle](https://dotnetfiddle.net/) - Online REPL * [F# Core Engineering](http://fsharp.github.io/) * [F# for Fun and Profit](https://fsharpforfunandprofit.com/) - Reference tutorials * [F# Software Foundation](http://fsharp.org/) - Main website * [fantomas-tools](https://fsprojects.github.io/fantomas-tools) - A set of Fantomas related tools like AST viewer and online bug reporter. * [SharpLab](https://sharplab.io/) - C#/VB/F# compiler playground. * [Try F#](http://www.tryfsharp.org) - Online tutorials, currently without execution of code due to Silverlight dependency ### Videos * [Austin F# Meetup Group Recorded Presentations](http://usergroup.tv/videos/category/group/austin-f-meetup) * [Intro to F#](https://www.youtube.com/watch?v=1ioGr701c5Q&list=PLqWncHdBPoD4YEWoXQlRj1tiTc96HZxH8) * [Fast Dictionary in F#](https://www.youtube.com/watch?v=KMR2y1vcO-s&list=PLqWncHdBPoD4-d_VSZ0MB0IBKQY0rwYLd) * [F# Chats on performance](https://www.youtube.com/watch?v=EIBRoNEpg6c&list=PLqWncHdBPoD4O1sr2Q3W9gAuJ30s09U2r) * [Topological Sort](https://www.youtube.com/playlist?list=PLqWncHdBPoD5hEK31CcfmTHP-7icw2Xd0) ### Courses * [Data programming with F#](https://www.udemy.com/course/data-programming-with-f/) * [F# workshop](http://www.fsharpworkshop.com/) * [Introduction to F#](https://fsharp.tv/courses/fsharp-programming-intro/) * [Write yourself a scheme in 48 hours using F#](https://write-yourself-a-scheme.pangwa.com/)"
awesome-knowledge-graph.md,"# Awesome Knowledge Graph [![Awesome](https://awesome.re/badge.svg)](https://awesome.re) > A curated list of Knowledge Graph related learning materials, databases, tools and other resources ## Contents * [Infrastructure](#infrastructure)  * [Graph Databases](#graph-databases)  * [Triple Stores](#triple-stores) * [Graph Computing Frameworks](#graph-computing-frameworks)  * [Graph Visualization](#graph-visualization)  * [Graph Construction](#graph-construction)  * [Languages](#languages)  * [Managed Hosting Services](#managed-hosting-services) * [Knowledge Engineering](#knowledge-engineering)  * [Knowledge Fusion](#knowledge-fusion) * [Knowledge Graph Dataset](#knowledge-graph-dataset)  * [General](#general)  * [Semantic Network](#semantic-network)  * [Academic](#academic) * [Learning Materials](#learning-materials)  * [Official Documentations](#official-documentations)  * [Community Effort](#community-effort) ## Infrastructure ### Graph Databases * [AgensGraph](https://bitnine.net/agensgraph/) - multi-model graph database with SQL and Cypher support based on PostgreSQL * [ArangoDB](https://www.arangodb.com/) - highly available Multi-Model NoSQL database * [Atomic-Server](https://crates.io/crates/atomic-server/) - open-source type-safe graph database server with GUI, written in rust. Supports [Atomic Data](docs.atomicdata.dev/), JSON & RDF. * [Blazegraph](https://github.com/blazegraph/database) - GPU accelerated graph database * [Cayley](https://github.com/cayleygraph/cayley) - open source database written in Go * [CosmosDB](https://docs.microsoft.com/en-us/azure/cosmos-db/graph-introduction) - cloud-based multi-model database with support for TinkerPop3 * [Dgraph](https://dgraph.io) - Fast, Transactional, Distributed Graph Database (open source, written in Go) * [DSE Graph](https://www.datastax.com/products/datastax-enterprise-graph) - Graph layer on top of DataStax Enterprise (Cassandra, SolR, Spark) * [TypeDB](https://vaticle.com/) - a database with a rich and logical type system. * [Graphd](https://github.com/google/graphd) - the Metaweb/Freebase Graph Repository * [JanusGraph](http://janusgraph.org) - an open-source, distributed graph database with pluggable storage and indexing backends * [Memgraph](https://memgraph.com/) - High Performance, In-Memory, Transactional Graph Database * [Neo4j](http://tinkerpop.apache.org/docs/currentg/#neo4j-gremlin) - OLTP graph database * [Sparksee](http://www.sparsity-technologies.com/#sparksee) - makes space and performance compatible with a small footprint and a fast analysis of large networks * [Stardog](http://stardog.com/) - RDF graph database with OLTP and OLAP support * [OrientDB](http://orientdb.com/orientdb/) - Distributed Multi-Model NoSQL Database with a Graph Database Engine * [TigerGraph](https://www.tigergraph.com) - a complete, distributed, parallel graph computing platform for enterprise, supporting web-scale data analytics in real-time. * [Nebula Graph](https://nebula-graph.io/) - A truly distributed, linear scalable, lightning-fast graph database, using SQL-like query language. * [HugeGraph](https://github.com/hugegraph/hugegraph) - An open source TinkerPop 3 compliant OLTP Graph Database with pluggable storage bakcend which is similar to JanusGraph. It also supports OLAP through Spark GraphX. * [Diffbot](https://diffbot.com/products/knowledge-graph) - One of three Western entities to crawl a majority of the web. Largest commercially available knowledge graph. * [Weaver](https://www.weaverhq.com/) - A graph database built on top of Postgres, which allows you to query the dataset in both SQL and graph query languages including SQL, SPARQL, and GraphQL. * [Kuzu](https://kuzudb.com/) - A highly scalable, extremely fast, and very easy-to-use embeddable graph database. * [CogDB](https://cogdb.io/) - A Micro Graph Database for Python Applications. * [TuGraph](https://www.tugraph.org/) - Graph database behinde Alipay. It has achieved the top-ranking performance in LDBC-SNB, a globally recognised benchmark test, surpassing competing solutions. ### Triple Stores * [AllegroGraph](https://franz.com/agraph/allegrograph/) - high-performance, persistent graph database that scales to billions of quads * [Apache Jena](https://jena.apache.org/) - open source Java framework for building Semantic Web and Linked Data applications * [Copernic](https://git.sr.ht/~amirouche/copernic) - Data, and its history, via change requests at scale * [Eclipse RDF4J](http://rdf4j.org/) - (formerly known as Sesame) is an open source Java framework for processing RDF data. This includes parsing, storing, inferencing and querying of/over such data. It offers an easy-to-use API that can be connected to all leading RDF storage solutions. It allows you to connect with SPARQL endpoints and create applications that leverage the power of linked data and Semantic Web. * [GraphDB](http://graphdb.ontotext.com/graphdb/) - enterprise ready Semantic Graph Database, compliant with W3C Standards * [Virtuoso](https://virtuoso.openlinksw.com/) - a ""Data Junction Box"" that drives enterprise and individual agility by deriving a Semantic Web of Linked Data from existing data silos * [Apache Marmotta](https://marmotta.apache.org/) - (retired Apache project) an open platform for linked data. * [Oxigraph](https://github.com/oxigraph/oxigraph) - a light wight triple store written in Rust. ### Graph Computing Frameworks * [Apache Giraph](https://giraph.apache.org/) - an iterative graph processing system built for high scalability * [Apache TinkerPop](https://tinkerpop.apache.org/) - a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP) * [Apache Spark - GraphX](https://spark.apache.org/graphx/) - Apache Spark's API for graphs and graph-parallel computation * [Tencent Plato](https://github.com/tencent/plato) - a fast distributed graph computation and machine learning framework used by WeChat. * [Gradoop](https://github.com/dbs-leipzig/gradoop) - a distributed graph analytics framework based on Apache Flink ### Graph Visualization * [AntV G6](https://github.com/antvis/g6) - Simple, easy and complete high performance graph visualization engine written in JavaScript, from Ant Financial * [Graphistry](https://github.com/graphistry/pygraphistry) - An end-to-end GPU visual graph analytics engine (Nvidia RAPIDS.ai / Apache Arrow) with interfaces including JS/React, Python (Jupyter/StreamLit), REST, rich no-code/low-code UIs for various databases, and self + cloud hosting, from Graphistry. * [Gephi](https://gephi.org/) - Graph visualization platform software runs on Windows, Mac and Linux. * [KeyLines & ReGraph](https://cambridge-intelligence.com/) - Graph visualization tookits for JavaScript and React developer from Cambridge Intelligence. * [Linkurious](https://linkurio.us) - Linkurious is an enterprise ready on-premises graph visualization and analysis platform. * [Cytoscape](https://cytoscape.org/) - Open source graph visualization platform software runs on Windows, Mac and Linux. * [Cytoscape.js](https://js.cytoscape.org/) - Graph visualization tookit for JavaScript. * [Sigma.js](https://www.sigmajs.org/) - JavaScript library aimed at visualizing larger graphs. ### Graph Construction * [Morph-KGC](https://github.com/morph-kgc/morph-kgc/) - Knowledge graph generation system with RML mappings. * [Termboard](https://termboard.com/) - A very simple graphical editor to create Terms and Relations. It can use ChatGPT, Google Bard or any other chatbot. Ideal for beginners wanting to make and share quick sketches. ### Languages * [Cypher](http://www.opencypher.org/) * [Gremlin](https://tinkerpop.apache.org/gremlin.html) * [SPARQL](https://en.wikipedia.org/wiki/SPARQL) * [GraphQL+-](https://docs.dgraph.io/query-language/) - The query language of Dgraph, which is based on Facebook's GraphQL * [GQL](https://gql.today/) - An initiative to create a standard query language for property graph database, just like SQL for relational database. ### Managed Hosting Services * [CosmosDB @ Microsoft](https://docs.microsoft.com/en-us/azure/cosmos-db/graph-introduction) - Azure Cosmos DB is Microsoft's globally distributed, multi-model (Key-value, Document, Column, Graph) database service. * [JanusGraph @ IBM Compose](https://www.compose.com/databases/janusgraph) * [JanusGraph @ Google Cloud Platform](https://cloud.google.com/solutions/running-janusgraph-with-bigtable) - JanusGraph on Google Kubernetes Engine backed by Google Cloud Bigtable * [JanusGraph @ Amazon Web Services Labs](https://github.com/awslabs/dynamodb-janusgraph-storage-backend) - The Amazon DynamoDB Storage Backend for JanusGraph * [Neo4j @ Graphene](https://www.graphenedb.com/) * [Neo4j @ Graph Story](https://www.graphstory.com/) - End-to-end Graph Database hosting for Community and Enterprise Neo4j with expert help for development * [Neptune @ Amazon Web Services](https://aws.amazon.com/neptune/) - a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets * [Graph Engine Service @ Huawei Cloud](https://www.huaweicloud.com/en-us/product/ges.html) - Fully-managed, distributed, at-scale graph query and analysis service that provides a visualized interactive analytics platform. * [Graph Database (beta) @ Aliyun (Alibaba Cloud)](https://www.aliyun.com/product/gdb) - highly reliable and available property graph database that supports ACID and TinkerPop Gremlin query language. * [Tencent Knowledge Graph @ Tencent Cloud](https://cloud.tencent.com/product/tkg) - One stop platform for Graph database, computing and visualization. Currently available in beta test and only in Chinese. * [WoordLift](https://wordlift.io/) - Easy-to-use SEO-focused Graph Database hosting for web and e-commerce websites running on Apache Marmotta. * [Baidu Knowledge Graph @ Baidu AI Platform](https://ai.baidu.com/solution/kgaas) - One-stop AI platform to build knowledge graph and its applications. * [Graphistry](https://github.com/graphistry/pygraphistry) - Cloud accounts for Graphistry end-to-end GPU-accelerated visual graph analytics projects ## Knowledge Engineering * [YAGA-NAGA](https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/) - Harvesting, Searching, and Ranking Knowledge from the Web ### Knowledge Fusion * [Dedupe](https://github.com/dedupeio/dedupe) - dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on structured data. * [LIMES](https://github.com/dice-group/LIMES) - Link Discovery Framework for Metric Spaces. ## Knowledge Graph Dataset ### General * [BabelNet](https://babelnet.org/) - Both a **multilingual encyclopedic dictionary**, with lexicographic and encyclopedic coverage of terms, and a **semantic network** which connects concepts and named entities in a very large network of semantic relations, made up of about 16 million entries, called Babel synsets. Each Babel synset represents a given meaning and contains all the synonyms which express that meaning in a range of different languages. * [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page) - Wikidata is a free, collaborative, multilingual, secondary database, collecting structured data to provide support for Wikipedia, Wikimedia Commons, the other wikis of the Wikimedia movement, and to anyone in the world. * [Google Knowledge Graph](https://developers.google.com/knowledge-graph/) - Google’s Knowledge Graph has millions of entries that describe real-world entities like people, places, and things. * [Freebase](https://developers.google.com/freebase/) - Large scale knowledge base originally stated by Metaweb. Later aquired by Google and used in [Google Knowledge Graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/). * [DBpedia](https://wiki.dbpedia.org/) - DBpedia is a crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects. * [XLore](https://xlore.org/) - A large-scale English-Chinese bilingual knowledge graph by structuring and integrating Chinese Wikipedia, English Wikipedia, French Wikipedia, and Baidu Baike. * [The GDELT Project](https://www.gdeltproject.org/) - The GDELT Project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world. * [YAGO](http://yago-knowledge.org/) - A huge semantic knowledge base, derived from [Wikipedia](http://en.wikipedia.org/), [WordNet](http://wordnet.princeton.edu/) and [GeoNames](http://www.geonames.org/). Currently, YAGO has knowledge of more than 10 million entities (like persons, organizations, cities, etc.) and contains more than 120 million facts about these entities. The source code of YAGO is in this Github [repo](https://github.com/yago-naga/yago3). * [Zhishi.me](http://zhishi.me/) - Knowledge Graph data extracted from the largest Chinese encyclopedias, [Baidu Baike](https://baike.baidu.com/), [Hudong Baike](https://www.baike.com/) and [Chinese Wikipedia](https://zh.wikipedia.org/). * [NELL](http://rtw.ml.cmu.edu/rtw/) - Never-Ending Language Learner, read the web and extract facts from text found in web pages continuously and improve itself. * [Golden Protocol](https://golden.xyz/) - A decentralized canonical knowledge graph. It is open, transparent, consensus, bounty enabled and built in the age of Web 3. ### Semantic Network * [ConceptNet](http://conceptnet.io/) - ConceptNet is a freely-available semantic network, designed to help computers understand the meanings of words that people use. * [Microsoft Concept Graph](https://concept.research.microsoft.com/) - For Short Text Understanding * [OpenHowNet](https://openhownet.thunlp.org) - An Open Sememe-based Lexical Knowledge Base in Chinese. * [WordNet](http://wordnet.princeton.edu/) - A free large lexical database of English from Princeton University. ### Academic & Research * [AMiner](https://www.aminer.cn/) - Aminer aims to provide comprehensive search and mining services for researcher social networks. * [Microsoft Academic](https://academic.microsoft.com/) - Microsoft Academic (MA) employs advances in machine learning, semantic inference and knowledge discovery to help you explore scholarly information in more powerful ways than ever before. * [AceMap](https://www.acemap.info/) - Academic search engine based on knowledge graph which includes entities like paper, author, institution and etc. * [Semantic Scholar](https://www.semanticscholar.org/) - A free, AI-powered research tool for scientific literature. Collaborating with academic publishers to build a trustworthy and authoritative scientific knowledge graph. ### Other Domain * [Lynx](https://lynx-project.eu/) - an ecosystem of smart cloud services to better manage compliance, based on a Legal Knowledge Graph (LKG) which integrates and links heterogeneous compliance data sources including legislation, case law, standards and other private contracts. * [ResearchSpace](https://researchspace.org/) - A culture heritage knowledge graph from the British Museum. * [Unified Medical Language System (UMLS)](https://www.nlm.nih.gov/research/umls/index.html) - The UMLS integrates and distributes key terminology, classification and coding standards, and associated resources to promote creation of more effective and interoperable biomedical information systems and services, including electronic health records. * [DrugBank](https://go.drugbank.com/) - Knowledge base for drug interactions, pharmacology, chemical structures, targets, metabolism, and more. * [STRING](https://string-db.org/) - A database of known and predicted protein-protein interactions. ## Learning Materials ### Official Documentations * [Cypher](https://neo4j.com/developer/cypher-query-language/) - reference documentation * [Gremlin](http://tinkerpop.apache.org/docs/current/reference/#traversal) - reference documentation ### Community Effort * [Graph Book](https://github.com/krlawrence/graph) - TinkerPop3 centric book written by [Kelvin R. Lawrence](https://twitter.com/gfxman) * [SQL2Gremlin](http://sql2gremlin.com/) - transition from SQL to Gremlin by [Daniel Kuppitz](https://twitter.com/dkuppitz) * [The Gremlin Compendium](http://www.doanduyhai.com/blog/?p=13460) - minimum survival kit for any Gremlin user, 10 blog post series by [Doan DuyHai](https://twitter.com/doanduyhai) ## Conferences * [Graph Connect](http://graphconnect.com/) - powered by Neo4j * [Graph Day](http://graphday.com/) - an Independent Graph Conference from the Data Day folks * [Connected Data London](https://connected-data.london/) - Connected Data London brings together 160+ Artificial Intelligence, Semantic Technology, Linked Data and Graph Database innovators, thought leaders and practitioners annually in one great conference. The conference has expanded its themes and tracks, from its roots as the primary conference for Knowledge Graphs, Linked Data and Semantics to include related Graph Database and AI / Machine Learning technologies and practical use cases. ## Contribute Contributions welcome! Read the [contribution guidelines](contributing.md) first. Some of the content were copied from other awesome lists: * [awesome-graph](https://github.com/jbmusso/awesome-graph) - Graph, the infrastructure for Knowledge Graph * [awesome-knowledge-graph](https://github.com/husthuke/awesome-knowledge-graph) - Knowledge graph related materials but all in Chinese ## License [![CC0](http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg)](http://creativecommons.org/publicdomain/zero/1.0) To the extent possible under law, Sitao Z. has waived all copyright and related or neighboring rights to this work."


## Extract entities

In this step, the entities are going to be extracted from the raw data. But first, we need to setup the environment

### Define some javascript resources

Imports and helper functions to make coding in javascript easier inside of this notebook.

The circular replacer makes it so that javascript objects can be serialized even when they are recursive, which helps with debugging.
Using cheerio and marked, both javascript packages, parse the content into a queryable tree.

In [3]:
// Imports
marked = await import("https://cdn.jsdelivr.net/npm/marked/lib/marked.esm.js");
cheerio = await import("https://cdn.jsdelivr.net/npm/cheerio@1.0.0-rc.12/+esm");

// Helper Functions
getCircularReplacer = function (maxDepth) {
    const ancestors = [];
    let depth = 0;
    return function (key, value) {
      if (depth > maxDepth) {
        return "[Max depth reached]";
      }
      if (typeof value !== "object" || value === null) {
        return value;
      }
      // `this` is the object that value is contained in,
      // i.e., its direct parent.
      while (ancestors.length > 0 && ancestors.at(-1) !== this) {
        ancestors.pop();
        depth--;
      }
      if (ancestors.includes(value)) {
        return "[Circular]";
      }
      ancestors.push(value);
      depth++;
      return value;
    };
  };

### Perform entity extraction

Using marked and cheerio, extracting entities from markdown files.
Following code extracts headers, name, description and link from the markdown document. Records are grouped by filename so that source attribution is preserved.

In [4]:
#!set --value @fsharp:fileDict --name fileDict

//console.log(Object.entries(fileDict).map(x => x.length));
results = {};

for (const entry of Object.entries(fileDict)) {
    const [filename, content] = entry;
    
    console.log(filename);
    const $ = cheerio.load(marked.parse(content));

    let unorderedLists = {};
    // Find all unordered lists that are not nested inside of another unordered list (Top Level)
    $('ul:not(ul ul)').each(function(i, elem) {
        const unorderedListNode = $(this);
        unorderedLists[$(this)] = [];
    
        // Find all list items inside of the unordered list
        $(this).find('li').each(function(i, elem) {
            unorderedLists[unorderedListNode].push($(this));
        });
    });

    const result = Object.keys(unorderedLists).map(key => unorderedLists[key].map(li => {
        const link = $(li).find('a').attr('href');
        const description = $(li).text();
        const name = $(li).find('a').text();

        // Find all headers that are above the list item
        let headers = [];
        for (let i = 6; i >= 1; i--) {
            const header = $(li).parents().prevAll(`h${i}`).first();
            if (header.length) {
                headers.push(header.text());
            }
        }
        return { link, description, name, headers };
    })).flat();

    results[filename] = result;
}

console.log(JSON.stringify(results, null, 2));

results = Object.entries(results).map(x => {
    return {
        filename: x[0],
        data: x[1]
    };
});


awesome-fsharp.md

awesome-knowledge-graph.md

{
  "awesome-fsharp.md": [
    {
      "link": "#",
      "description": "Awesome F#\nF# wrappers for popular .NET libraries\nActor frameworks\nBuild tools\nCloud\nCompilers\nConcurrent, asynchronous and parallel programming\nConfiguration\nData Science\nDevelopment Tools\nIDE\nEditor plugins\nPerformance analysis\n\n\nGame development\nGeneral purpose libraries\nGUI\nHTTP Clients\nLogging\nPackage Management\nParsing\nPreCompilation\nSearch\nSerialization\nSimulation\nTesting\nType providers\nCreating type providers\n\n\nVisualization\nWeb frameworks\n.Net Core Templates\nResources\nBlogs\nCheatsheets\nCommunity\nOther Lists\nWebsites\nVideos\nCourses\n\n\n\n",
      "name": "Awesome F#F# wrappers for popular .NET librariesActor frameworksBuild toolsCloudCompilersConcurrent, asynchronous and parallel programmingConfigurationData ScienceDevelopment ToolsIDEEditor pluginsPerformance analysisGame developmentGeneral purpose librariesGUIHTTP ClientsLoggingPackage ManagementParsingPreCompil

## Save output to file

Now that the entities have been extracted from the markdown files into a more useful unified dataset of entities found within each file, save the output of the scraping step to a single json file.

I don't know how to access io in javascript, so going back to f#.

In [5]:
#!set --value @javascript:results --name results
open System.IO
open System.Text.Json

let filePath = "../data/scrapped/scrapped-dataset.json"
let json = JsonSerializer.Serialize(results, JsonSerializerOptions(WriteIndented = true))

let directoryPath = Path.GetDirectoryName(filePath)
if not (Directory.Exists(directoryPath)) then // Equiv to not <| Directory.Exists(directoryPath), looks like a technique to remove a layer of nesting
    Directory.CreateDirectory(directoryPath) |> ignore

File.WriteAllText(filePath, json)