Sorularin cozumleri solutions klasoru altindaki readme dosyasinda detayli bir sekilde anlatilmistir. ayni zamanda farkli klasorlerin altinda daha detayli bilgiye ulasmak icin o bolumlerdeki readme dosyalari okunmasi onerilir.
Kurulum icin pip paket yukleyicisini kurun.
Proje icin virtual environment siddetle onerilir:
python3 -m venv venv
Asagidaki kod kurulum icin ihtiyaciniz olan kutuphanelerin kurulumunu yapacaktir.
pip install -r requirements.txt
- mariadb
- requests
- beautifulsoup4
- mysql-connector-python
- flask
- gunicorn
brew install --cask mysql-connector-python
Mac cihazlar icin en kolay MySQL kurulumu brew uzerinden yapilmaktadir.
- MySQL Kurulumu ve servisin baslatilmasi:
brew install mysql
brew services start mysql
mysql_secure_installation
Projeyi dockerize etmek icin gerekli altyapi ve dosyalar mevcut fakat gecici bir sureligine iptal edilmistir.
Heroku deploymenti icin dizin icerisinde Procfile, requirements.txt ve x adinda uc dosya bulunmaktadir. Bunlar heroku tarafinda olusturulacak container, paket bagimliliklari ve port aktarimlari icin yazilmistir. Detayli bilgi icin heroku dokumantasyonu.
IMDb Datasetleri linkten veya dataset_download.py scripti ile indirilebilir.
- Indirilen dosyalari indirmek ve extract etmek icin:
python3 -m dataset_download.py
python3 -m datasets/unzip_files.py
- MySQL Arayuzune giris:
mysql -u root -p
- Database'i yaratin ve tablolari olusturun:
source /Users/mertcobanoglu/Repos/demc-homework/database/initialize_tables.sql
- Verileri tablolara import edin
source /Users/mertcobanoglu/Repos/demc-homework/database/import_data.sql
Heroku deploymenti icin uyumlu olmasi adina flask appini ana dizinden calistirmak kolaylik sagliyor. Ayni zamanda webapp asagidaki komut ile calistirilabilir. localhost:5000
adresinden ziyaret edilecektir.
python3 wsgi.py
Heroku deploymenti icin database klasoru altindaki db_utils.py dosyasinda veritabani baglanti degiskenleri AWS RDS'den alinan bilgilerle degistirilmelidir. Bu bilgiler db bilgilerini kaynak koda eklememek adina config vars altinda environmental variables olarak girilmesi tavsiye edilir.
docker run --name imdb -p 3306:3306 -e MYSQL_ROOT_PASSWORD=55347314 linuxserver/mariadb
brew install mariadb
pip3 install mariadb
Source: https://www.imdb.com/interfaces/
IMDb Dataset DetailsEach dataset is contained in a gzipped, tab-separated-values (TSV) formatted file in the UTF-8 character set. The first line in each file contains headers that describe what is in each column. A ‘\N’ is used to denote that a particular field is missing or null for that title/name. The available datasets are as follows:
title.akas.tsv.gz
-
titleId (string) - a tconst, an alphanumeric unique identifier of the title
-
ordering (integer) – a number to uniquely identify rows for a given titleId
-
title (string) – the localized title
-
region (string) - the region for this version of the title
-
language (string) - the language of the title
-
types (array) - Enumerated set of attributes for this alternative title. One or more of the following: "alternative", "dvd", "festival", "tv", "video", "working", "original", "imdbDisplay". New values may be added in the future without warning
-
attributes (array) - Additional terms to describe this alternative title, not enumerated
-
isOriginalTitle (boolean) – 0: not original title; 1: original title
title.basics.tsv.gz
-
tconst (string) - alphanumeric unique identifier of the title
-
titleType (string) – the type/format of the title (e.g. movie, short, tvseries, tvepisode, video, etc)
-
primaryTitle (string) – the more popular title / the title used by the filmmakers on promotional materials at the point of release
-
originalTitle (string) - original title, in the original language
-
isAdult (boolean) - 0: non-adult title; 1: adult title
-
startYear (YYYY) – represents the release year of a title. In the case of TV Series, it is the series start year
-
endYear (YYYY) – TV Series end year. ‘\N’ for all other title types
-
runtimeMinutes – primary runtime of the title, in minutes
-
genres (string array) – includes up to three genres associated with the title
title.crew.tsv.gz
-
tconst (string) - alphanumeric unique identifier of the title
-
directors (array of nconsts) - director(s) of the given title
-
writers (array of nconsts) – writer(s) of the given title
title.episode.tsv.gz
-
tconst (string) - alphanumeric identifier of episode
-
parentTconst (string) - alphanumeric identifier of the parent TV Series
-
seasonNumber (integer) – season number the episode belongs to
-
episodeNumber (integer) – episode number of the tconst in the TV series
title.principals.tsv.gz
-
tconst (string) - alphanumeric unique identifier of the title
-
ordering (integer) – a number to uniquely identify rows for a given titleId
-
nconst (string) - alphanumeric unique identifier of the name/person
-
category (string) - the category of job that person was in
-
job (string) - the specific job title if applicable, else '\N'
-
characters (string) - the name of the character played if applicable, else '\N'
title.ratings.tsv.gz
-
tconst (string) - alphanumeric unique identifier of the title
-
averageRating – weighted average of all the individual user ratings
-
numVotes - number of votes the title has received
name.basics.tsv.gz
-
nconst (string) - alphanumeric unique identifier of the name/person
-
primaryName (string)– name by which the person is most often credited
-
birthYear – in YYYY format
-
deathYear – in YYYY format if applicable, else '\N'
-
primaryProfession (array of strings)– the top-3 professions of the person
-
knownForTitles (array of tconsts) – titles the person is known for
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.