# Find most valuable domain
I am trying to find the most valuable domain for myself, but I don't which one is the highest value. 

To achieve this, I first get 3000 most common English words, then generate a list of domains based on the prefix `bit` + 3000 most common English words then append `.com`, `.ai`, `.io`. For example: `bitapple.com`, `bitapple.ai`, `bitapple.io`. Then use [BitSky](https://bitsky.io) to check whether it is available in [godaddy.com](https://www.godaddy.com/) if it is available then check the estimated value.

## Mount google drive folder
Mount **Google Drive** to `/gdrive` folder

In [31]:
from google.colab import drive
drive.mount('/gdrive')

Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).


## Read generated available domains file
Each domain data like this:
```json
{
  "_id": { "$oid": "5f260c5922ad29001764048b" },
  "domain": "bitnarratology.com",
  "price": 99.99,
  "value": 745,
  "reasons": [
    {
      "dev_only": false,
      "rank": 8,
      "Type": "great_extension",
      "Text": "This tld is the most popular one: .com.",
      "Title": "Great extension"
    },
    {
      "dev_only": false,
      "rank": 9,
      "Sld": "bitnarratology",
      "Type": "short",
      "Text": "The sld is 15 characters or less.",
      "Title": "Short"
    }
  ]
}

```

In [91]:
import pandas as pd
import numpy as np
domainsDF = pd.read_json('/gdrive/My Drive/domain/domains.json')

## Overview `domains.json`

In [92]:
domainsDF.describe()

Unnamed: 0,price,value
count,448287.0,448287.0
mean,99.99,318.013333
std,1.620039e-11,443.950877
min,99.99,0.0
25%,99.99,100.0
50%,99.99,100.0
75%,99.99,265.0
max,99.99,2237.0


In [93]:
domainsDF.shape

(448287, 5)

In [94]:
domainsDF.head(10)

Unnamed: 0,_id,domain,price,value,reasons
0,5f260c5922ad290017640487,bitnarrator.com,99.99,1255,"[{'dev_only': False, 'rank': 3, 'Domain': 'bit..."
1,5f260c5922ad290017640488,bitnarratology.io,99.99,100,"[{'dev_only': False, 'rank': 9, 'Sld': 'bitnar..."
2,5f260c5922ad290017640489,bitnarrator.io,99.99,100,"[{'dev_only': False, 'rank': 3, 'Domain': 'bit..."
3,5f260c5922ad29001764048a,bitnarratology.ai,99.99,100,"[{'dev_only': False, 'rank': 9, 'Sld': 'bitnar..."
4,5f260c5922ad29001764048b,bitnarratology.com,99.99,745,"[{'dev_only': False, 'rank': 8, 'Type': 'great..."
5,5f260c5922ad29001764048c,bitnarratives.io,99.99,100,"[{'dev_only': False, 'rank': 1, 'Keyword': 'na..."
6,5f260c5922ad29001764048d,bitnarrators.com,99.99,1251,"[{'dev_only': False, 'rank': 3, 'Domain': 'bit..."
7,5f260c5922ad29001764048e,bitnarrations.ai,99.99,100,"[{'dev_only': False, 'rank': 3, 'Domain': 'bit..."
8,5f260c5c22ad29001764048f,bitnaomi.com,99.99,1193,"[{'dev_only': False, 'rank': 1, 'Keyword': 'bi..."
9,5f260c5c22ad290017640490,bitnaples.com,99.99,1382,"[{'dev_only': False, 'rank': 8, 'Type': 'great..."


## Data Clean
1. Remove `_id` and `reason` column
2. Trim `domain` column
3. Remove duplicate domains

### Remove `_id` and `reason`
`_id` and `reason` dosn't need for find highest value domain, so remove them



In [95]:
domainsDF = domainsDF.drop(['_id', 'reasons'], axis=1)

### Trim `domain` column

In [96]:
domainsDF['domain'] = domainsDF['domain'].str.strip()
domainsDF.head()

Unnamed: 0,domain,price,value
0,bitnarrator.com,99.99,1255
1,bitnarratology.io,99.99,100
2,bitnarrator.io,99.99,100
3,bitnarratology.ai,99.99,100
4,bitnarratology.com,99.99,745


### Remove duplicate domains

In [97]:
domainsDF = domainsDF.drop_duplicates(subset=['domain'])
domainsDF.shape

(162466, 3)

## Cost Performance
Domain value is based on [Godaddy Domain Name Value & Appraisal](https://www.godaddy.com/domain-value-appraisal)

1. `value_per_dollar`: $Estimate\ Value/Price$
2. `value_per_character`: $Estimate\ Value/Domain\ Length$
3. `value_per_dollar_character`: $(Estimate\ Value/Price)/Domain\ Length$

> * `Estimate Value`: `value` field
> * `Price`: `price` field
> * `Domain Length`: `domain_length` field

In [98]:
domainsDF['domain_length'] = domainsDF['domain'].str.len()
domainsDF['domain_type'] = domainsDF['domain'].str.split('.').str[-1]
domainsDF['value_per_dollar_character'] = domainsDF['value']/domainsDF['price']/domainsDF['domain_length']
domainsDF['value_per_character'] = domainsDF['value']/domainsDF['domain_length']
domainsDF['value_per_dollar'] = domainsDF['value']/domainsDF['price']
domainsDF.head()

Unnamed: 0,domain,price,value,domain_length,domain_type,value_per_dollar_character,value_per_character,value_per_dollar
0,bitnarrator.com,99.99,1255,15,com,0.83675,83.666667,12.551255
1,bitnarratology.io,99.99,100,17,io,0.058829,5.882353,1.0001
2,bitnarrator.io,99.99,100,14,io,0.071436,7.142857,1.0001
3,bitnarratology.ai,99.99,100,17,ai,0.058829,5.882353,1.0001
4,bitnarratology.com,99.99,745,18,com,0.41393,41.388889,7.450745


### Cost performance based on Domain Length

In [102]:
domainsDF = domainsDF.sort_values(by='value_per_character', ascending=False)
pd.set_option('display.max_rows', 500)
domainsDF.head(20)

Unnamed: 0,domain,price,value,domain_length,domain_type,value_per_dollar_character,value_per_character,value_per_dollar
9796,bitlion.io,99.99,2237,10,io,2.237224,223.7,22.372237
17880,bitins.io,99.99,1944,9,io,2.160216,216.0,19.441944
410326,bittop.io,99.99,1932,9,io,2.146881,214.666667,19.321932
97839,bitsky.io,99.99,1928,9,io,2.142436,214.222222,19.281928
405685,bitam.io,99.99,1678,8,io,2.09771,209.75,16.781678
4815,bitmen.io,99.99,1885,9,io,2.094654,209.444444,18.851885
408183,bitnew.io,99.99,1831,9,io,2.034648,203.444444,18.311831
113159,bittag.io,99.99,1776,9,io,1.973531,197.333333,17.761776
25203,bitgreen.io,99.99,2136,11,io,1.942012,194.181818,21.362136
407211,bitace.io,99.99,1728,9,io,1.920192,192.0,17.281728


### Cost performance based on Price

In [103]:
domainsDF = domainsDF.sort_values(by='value_per_dollar', ascending=False)
pd.set_option('display.max_rows', 500)
domainsDF.head(20)

Unnamed: 0,domain,price,value,domain_length,domain_type,value_per_dollar_character,value_per_character,value_per_dollar
9796,bitlion.io,99.99,2237,10,io,2.237224,223.7,22.372237
12064,bitlattice.io,99.99,2199,13,io,1.691708,169.153846,21.992199
70842,bitpelican.com,99.99,2149,14,com,1.535154,153.5,21.492149
25203,bitgreen.io,99.99,2136,11,io,1.942012,194.181818,21.362136
116946,bitsystem.io,99.99,2095,12,io,1.746008,174.583333,20.952095
342010,bitvalue.io,99.99,2088,11,io,1.898372,189.818182,20.882088
341993,bitvalley.io,99.99,2087,12,io,1.739341,173.916667,20.872087
337190,bittrawler.com,99.99,2018,14,com,1.441573,144.142857,20.182018
394356,bitcycle.io,99.99,1997,11,io,1.815636,181.545455,19.971997
30392,bitfinder.io,99.99,1992,12,io,1.660166,166.0,19.921992


### Cost performance based on Price and Domain Length

In [104]:
domainsDF = domainsDF.sort_values(by='value_per_dollar_character', ascending=False)
pd.set_option('display.max_rows', 500)
domainsDF.head(20)

Unnamed: 0,domain,price,value,domain_length,domain_type,value_per_dollar_character,value_per_character,value_per_dollar
9796,bitlion.io,99.99,2237,10,io,2.237224,223.7,22.372237
17880,bitins.io,99.99,1944,9,io,2.160216,216.0,19.441944
410326,bittop.io,99.99,1932,9,io,2.146881,214.666667,19.321932
97839,bitsky.io,99.99,1928,9,io,2.142436,214.222222,19.281928
405685,bitam.io,99.99,1678,8,io,2.09771,209.75,16.781678
4815,bitmen.io,99.99,1885,9,io,2.094654,209.444444,18.851885
408183,bitnew.io,99.99,1831,9,io,2.034648,203.444444,18.311831
113159,bittag.io,99.99,1776,9,io,1.973531,197.333333,17.761776
25203,bitgreen.io,99.99,2136,11,io,1.942012,194.181818,21.362136
407211,bitace.io,99.99,1728,9,io,1.920192,192.0,17.281728
