# Hispasonic (from web to csv)


<br>

A website about musical instruments, recording stuff, and everything related to the world of music. In this place there is also a second-hand market where users sell, buy, change or give their musical instruments.

In this first part of the project focuses on obtaining relevant ad information, the category I have focused on has been the one that refers to electronic musical instruments.

<br>

Before start obtaining information, the first thing we must know is to understand how the announcement page is organized.

***

- *Image of one of the pages of hispasonic*


![hispa_1e.png](images/hispa_1e.png)

<br>

<br>

We can see several important things:

- Selected category is on "teclados y sintetizadores".

- Know the number of pages that we are going to analyze to get **all the ads**.



## 1. Function library loading.

In [1]:
import requests               # Is an elegant and simple HTTP library for Python
from bs4 import BeautifulSoup # library for pulling data out of HTML and XML files
import re                     # regular expressions operations
import pandas as pd           # A fast, powerful, flexible and easy to use open source data analysis tool
import os                     # A versatile way to use operating system-dependent functionality.
import datetime as dt         # module for manipulating dates and times.
import time

pd.set_option("display.max_rows", None)

### First contact

First of all we must to know if we have a proper response from the server.

In [2]:
%%html 
<style>
table {float:left}
</style>

These are the main possible answers we can get from the server:

|||
|:--|:--|
|**1xx informational response –** |the request was received, continuing process|
|**2xx successful –** |the request was successfully received, understood, and accepted|
|**3xx redirection –** |further action needs to be taken in order to complete the request|
|**4xx client error –** |the request contains bad syntax or cannot be fulfilled|
|**5xx server error –** |the server failed to fulfil an apparently valid request|

In [3]:
# Enter the address and see the response from the server.

url = "https://www.hispasonic.com/anuncios/teclados-sintetizadores"
page = requests.get(url)
page

<Response [200]>

#### *<Response [200]> means correct connection.*

## 2. Number of pages to analyze.

Once we have communication, we have to know how to determine how to obtain the **total number of pages** to scrap.

In each of the pages are the ads that we want to analyze, so it is very important to know how to obtain that value, since it can vary depending on the number of ads that are offered.

![cantidad_iteraciones.png](images/cantidad_iteraciones.png)

The item is identified as follows.

       'ul', class_='pagination'
       
<br>

It means 'unordered list' with a class name called `pagination`.


To determine the number of iterations, that is, the number of pages on which to extract the information, I must:

- Find this element inside the html content.

- Know the value.

We will do this with *Beautifulsoup* use to extract the contents of an element.

In [4]:
soup = BeautifulSoup(page.content, 'html.parser')
# soup <- all site code

So inside `soup` variable we are looking for `'ul', class_='pagination'`

The following code refers to:

- **first 5 links of the pages**

- the **next 10 pages** and the **last one**, which is the one that interests us.

Save it in a variable, called `unordered_list`

In [5]:
unordered_list = soup.find('ul', class_='pagination') # into variable
unordered_list = unordered_list.contents # tag's children available in a list called .content. from variable to list
unordered_list

['\n',
 <li>
 <span class="selected">1</span>
 </li>,
 '\n',
 <li>
 <a href="/anuncios/teclados-sintetizadores/pagina2" rel="next">2</a>
 </li>,
 '\n',
 <li>
 <a href="/anuncios/teclados-sintetizadores/pagina3">3</a>
 </li>,
 '\n',
 <li>
 <a href="/anuncios/teclados-sintetizadores/pagina4">4</a>
 </li>,
 '\n',
 <li>
 <a href="/anuncios/teclados-sintetizadores/pagina5">5</a>
 </li>,
 '\n',
 <li>
 <a href="/anuncios/teclados-sintetizadores/pagina11" title="Siguientes 10 páginas">›</a>
 </li>,
 '\n']

### 2.1 Exploring `unordered_list` variable.

We can see that it is a list therefore know what its length is and know in what position the elements that compose it are.

In [6]:
len(unordered_list) # number of elements

13

In [7]:
unordered_list[0] # first element

'\n'

In [8]:
unordered_list[-1] # last element

'\n'

In [9]:
unordered_list[-2] # this is the one I'm interested in

<li>
<a href="/anuncios/teclados-sintetizadores/pagina11" title="Siguientes 10 páginas">›</a>
</li>

### 2.2 How to get the value number from `unordered_list`?

<br>

As what I need is to access the value within the list the strategy that will be the following:

- Convert the list to a text string.

- Filter the characters that correspond to numeric values, the max ones.

- Convert those numeric characters to numbers

I will convert the contents of the list into a text string and have the numeric characters extracted together with highest values by using regular expressions.

Converting the content of `paginas` into a text string.

In [10]:
test = str(unordered_list[-2])
test

'<li>\n<a href="/anuncios/teclados-sintetizadores/pagina11" title="Siguientes 10 páginas">›</a>\n</li>'

`extractMax` A function that gets the numbers contained in the lowercase text and converts them to integer numbers.

In [11]:
def extractMax(input):
     # get a list of all numbers separated by 
     # lower case characters 
     # \d+ is a regular expression which means
     # one or more digit
     # output will be like ['100','564','365']
    numbers = re.findall('\d+',input)
     # now we need to convert each number into integer
     # int(string) converts string into integer
     # we will map int() function onto all elements 
     # of numbers list
    numbers = map(int,numbers)
    return max(numbers) # devuelve un entero

In [12]:
page_numbers = extractMax(test)
page_numbers

11

We already have the number of pages that we will have to analyze. 

***

## 3. Getting and save all links (ads and not ads)

Once we have the number of pages in which we must extract the ads, the next step is to extract those ads from each of the pages looking inside the code of each of them.

So what we have to do is:

- Extracting everything that is a link.


- From the links extracted, the most important thing is get the final number which is the way to identify those who are ads and what are not.

In [13]:
links_ads = []        # all the ads on the page
listado_enlaces = []  # all the links on the page

pattern="([0-9]{4,9})" # filtering all links with number, that mean choosing the page number related to and ad.

for pagina in range(page_numbers, 0, -1): 
    url = "https://www.hispasonic.com/anuncios/teclados-sintetizadores/pagina{pagina}".format(pagina=pagina)
    print(url)
    
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')

    
    for link in soup.find_all('a'):       # filter everything that is a link on soup variable
        links_ads.append(link.get('href'))
        fecha = soup.find_all('span', class_='miniicon miniicon-date')
        
    
    for s in links_ads:                   # of those links what I do is stay with what ends in number
        if re.search(pattern, s):
            listado_enlaces.append(s)

https://www.hispasonic.com/anuncios/teclados-sintetizadores/pagina11
https://www.hispasonic.com/anuncios/teclados-sintetizadores/pagina10
https://www.hispasonic.com/anuncios/teclados-sintetizadores/pagina9
https://www.hispasonic.com/anuncios/teclados-sintetizadores/pagina8
https://www.hispasonic.com/anuncios/teclados-sintetizadores/pagina7
https://www.hispasonic.com/anuncios/teclados-sintetizadores/pagina6
https://www.hispasonic.com/anuncios/teclados-sintetizadores/pagina5
https://www.hispasonic.com/anuncios/teclados-sintetizadores/pagina4
https://www.hispasonic.com/anuncios/teclados-sintetizadores/pagina3
https://www.hispasonic.com/anuncios/teclados-sintetizadores/pagina2
https://www.hispasonic.com/anuncios/teclados-sintetizadores/pagina1


#### This is a small sample of the contents of the lists

It can be seen as being links in both cases, in the first we only have links that do not interest us.

In [14]:
links_ads[5:20] # example: row 5 to 20 of everything is a link on soup variable

['/musica',
 '/productos',
 '/anuncios',
 '/anuncios/todo',
 '/anuncios/todo/f/compra-protegida',
 '/anuncios',
 '/anuncios/compraventa',
 '/anuncios/teclados-sintetizadores',
 '/anuncios/todo/f/compra-protegida',
 '/compra-protegida',
 '/anuncios/todo/f/compra-protegida',
 '/index.php?controller=ad&action=new_ad_form',
 '/anuncios/teclados-sintetizadores',
 '/anuncios/teclados-sintetizadores',
 '/anuncios/teclados-sintetizadores/pagina9']

However in the second list `listado_enlaces` what we have are the links we want to get in each of the pages.

In [15]:
listado_enlaces[5:20] # example: of those links what I do is stay with what ends in number

['/anuncios/jorge-rk100-blanco/1104591',
 '/anuncios/roland-jd-08/1104588',
 '/anuncios/roland-jd-08/1104588',
 '/anuncios/erica-synths-syntrx-2/1104582',
 '/anuncios/erica-synths-syntrx-2/1104582',
 '/anuncios/korg-prologue-8-polyphonic-analogue-synthesizer/1103362',
 '/anuncios/korg-prologue-8-polyphonic-analogue-synthesizer/1103362',
 '/anuncios/elektron-analog-rytm-mk1-muchos-extras/1104532',
 '/anuncios/elektron-analog-rytm-mk1-muchos-extras/1104532',
 '/anuncios/soma-lyra-8-sintetizador-drone-analogico/1098125',
 '/anuncios/soma-lyra-8-sintetizador-drone-analogico/1098125',
 '/anuncios/arturia-matrixbrute/1094941',
 '/anuncios/arturia-matrixbrute/1094941',
 '/anuncios/organo-hammond-x5-mkii/1082067',
 '/anuncios/organo-hammond-x5-mkii/1082067']

## 3.1 Cleaning links.

Taking a look into `listado_enlaces` it is striking that there are links that are repeated and we need to do a couple of things.

            '...
            '/anuncios/korg-vocoder-vc10/866556',
             '/anuncios/korg-vocoder-vc10/866556',
             '/anuncios/polyend-tracker/1057403',
             '/anuncios/polyend-tracker/1057403',
             '/anuncios/trajetas-teclados/949462',
             '/anuncios/trajetas-teclados/949462',
                                             ...',



- 1. Extract the brand name from the url using regular expressions.

<br>


![regex_expression.png](images/regex_expression.png)

<br>


- 2. Filter the amount of url repeated.

<br>

To get **not repeated url**, we will make a filter with a dictionary.

The main idea is filter the url repeated as `key` and asign it a synth brand for this unique url as `value`.

In [16]:
os.chdir('/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls') # folder to save htmls.

In [17]:
diccionario_enlaces = {} # dict

listado_marcas = []      # synth_brand

patron_marca = "((?<=anuncios\/)[1-9][a-z]{1,})|((?<=anuncios\/)[a-z]{1,})" # filter brand regex

for enlace in listado_enlaces:
    if enlace not in diccionario_enlaces:  
        try:
            marca = re.search(patron_marca, enlace).group()
            diccionario_enlaces[enlace] = marca
        except AttributeError:
            #marca = re.search(patron_marca, enlace)
            pass # voy a ver si funciona, lo que aprendi del try except

With the dictionary that we have just created we are going to download all the ads locally.

The reason is not to overload the server and run the risk of being banned.

## 3.2 Download all the ads.


To avoid the inconvenience that would suppose the overload of the server, we will download all the ads in local mode adding a delay in the download time. In this way we will work with more comfort.

In [18]:
%%time

main_path='https://www.hispasonic.com'
local_path = '/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls/'

for enlace in diccionario_enlaces:
    time.sleep(1)                           # Sleep for 3 seconds
    page = requests.get(main_path + enlace) # https://www.hispasonic.com/anuncios/polyend-tracker/1057403.html
    #print(main_path + enlace)
    enlace = enlace.split("/")              # filter for extracting
    enlace= enlace[2]                       # name ad

    with open(local_path + enlace + '.html',"w+") as f:
        f.write(page.text)
        
    print(local_path + enlace)

/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls/sequential-prophet-5-rev4
/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls/maschine-mk-i
/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls/jorge-rk100-blanco
/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls/roland-jd-08
/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls/erica-synths-syntrx-2
/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls/korg-prologue-8-polyphonic-analogue-synthesizer
/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls/elektron-analog-rytm-mk1-muchos-extras
/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls/soma-lyra-8-sintetizador-drone-analogico
/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls/arturia-matrixbrute
/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls/organo-hammond-x5-mkii
/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls/organo-hammond-x5
/h

## 3.3 It's not all about sales.

<br>

When ads have been downloaded, the next step is doing a quick scan inside the downloaded ads, so there's no only sales.


A starting point is to look in the description of the titles and see if some of these words exist.


By using `find` and `grep` together we can see if these words we are looking for are inside the files.

<br>



![vendo.png](images/vendo.png)

- **vendo : *sell***

<br>


![busco_piezas.png](images/busco_piezas.png)

- **busco, se busca: *looking for*** and - **piezas: *parts***

<br>


![cambio.png](images/cambio.png)

- **cambio: *change***

<br>
    
![compro.png](images/compro.png)

- **compro: *buy***

<br>

![regalo.png](images/regalo.png)

- **regalo: *for free***

<br>

This information will be very useful because these are the actions, and it will allow us to classify if the ad is for sale, purchase or any other concept that we have discovered.

## 3.4 Elements of the ad that we are going to extract.

<br>
Another step to take into account is to obtain:

- **description**, **user**, **price**, **brand**, **city** , **date published** ,**date expire** ,**times seen**

<br>


![hispa_4.png](images/hispa_4.png)



<br>

This is an ad as example and the fields we want to get:

## 3.5 Extraction of the "action" and the synthesizer name from the description.


<br>

### 3.5-1 Extraction of action

<br>


The extraction contained in the fields is not very complicated, however in the description we find a problem to solve and it is about how to differentiate a sale, a purchase or a change.


To do this, the solution carried out has been to use a series of keywords in the meaning of the ad as triggers of an **accion: *action*** in the event that those words exist in the description of the advertisement. 


In the same way as we (humans) would do to see if the ad is a sale or on the contrary a gift.


In [19]:
accion = ["compro","cambio","vendo","regalo","busco","busca",'reparar','piezas']

Once we have the `accion` keywords list, the next step is to make them as a trigger, that is, manage to make a certain action.

<br>

Using the words contained in `accion` list as the key, and the value of the dictionary a call to a function depending the action on acción.

<br>

    func_dict = {                      # the key give us the action (function)
        "compro":func_compro,
        "cambio":func_cambio,
        ...


    def func_compro(clave_func_dict):  # if `compro` means I am not selling, and so on...

    if list_compro[-1] == "0":  
        list_vendo.pop(-1)
        list_vendo.append("0")
        list_compro.pop(-1)
        list_compro.append("1")
    else:
        pass
        


<br>

### 3.5-1 Extraction of synthesizer name.

<br>


The next step we must implement is all the possible brands of synthesizer manufacturers that we can find in the ads. 


To do this by doing an internet search I could find a list of a large number of them, at least to date.


However due to the time I have been working with the project I already have a list of names `sintes` with which I have been working but that when I reached this point I realized that I had to modify and merge with the new list.

link where I obtain the brand synth: https://www.perfectcircuit.com/modular-synths

#### Synthesizer manufacturers.

In [20]:
sintes = ['0 coast', '0-coast', '000', '4ms', 'a-v-p synth', 'acces', 'access', 'acidlab', 'akai', 'alembic', 'alesis', 'allen & heath', 'allen&heath', 
'analogaudio1', 'analogue solutions', 'analogue systems', 'arp', 'arturia', 'asm', 'asm (ashun sound machines)', 'atomo synth', 'atomosynth', 'audio damage', 
'audiophile circuits league', 'axoloty', 'balaguer', 'baloran', 'bastl instruments', 'befaco', 'behringer', 'behringer', 'bheringer', 'bitbox', 
'black corporation', 'boss', 'bubblesound instruments', 'buchla', 'bÃ¶hm', 'casio', 'charlie lab', 'charvel', 'chronograf', 'circuit abbey', 'clavia', 
'club of the knobs', 'coast', 'corsynth', 'cre8audio', 'crumar', 'custom made synths', 'cyclone', 'cyclone analogic', 'dave jones design', 'dave smith', 
'dave smith instruments', 'deepmind', 'deepmind 12', 'deepmind 6', 'delptronics', 'delta music', 'denon dj', 'dexibell', 'dexibell', 'digitack', 'doepfer', 
'dreadbox', 'dubreq', 'dynacord', 'e mu', 'e-mu', 'e-mu', 'e:m:c', 'elby designs', 'electribe', 'electronic music laboratories (eml)', 
'electrovoice', 'elektron', 'elka', 'emc', 'emu', 'endorphin.es', 'endorphines', 'ensoniq', 'eowave', 'epiphone', 'erica synth', 'erica synths', 
'ernie ball music man', 'esp ltd', 'eurorack', 'eventide', 'evh', 'evolver', 'exodus digital', 'farfisa', 'fender', 'fishman', 'fodera', 'formanta', 
'frap tool', 'frequency central', 'fretlight', 'friedman', 'future retro', 'futuresonus', 'gator', 'gemini', 'generalmusic', 'gibson', 'godin', 'gotharman', 
'graph tech', 'gretsch', 'guild', 'hammond', 'hartmann', 'hexinverter', 'hinton instruments', 'hofner', 'hypersynth', 'hÃ¶fner', 'ibanez', 'ik', 'instruo', 
'intellijel', 'iomega', 'isla', 'jackson', 'jaspers', 'john bowen synth design', 'jomox', 'kawai', 'kenton', 'ketron', 'kilpatrick audio', 'knobula', 
'koma elektronik', 'komplete', 'korg', 'kramer', 'kurzweil', 'kurzweil', 'lakland', 'line 6', 'linn electronics', 'livid', 'logan electronics', 'm-audio', 
'macbeth studio systems', 'make', 'malekko', 'manikin electronic', 'maschine', 'mellotron', 'mfb', 'micro modular', 'miditech', 'modal', 'modal electronics', 
'models', 'modor', 'modular', 'modulus', 'monome', 'moog', 'mpc', 'mpc', 'mutable instruments', 'mutant', 'native instruments', 'neutron', 'noise engineering', 
'nord', 'nord electro', 'nord lead 2 rack', 'nord lead 3', 'nord lead 3', 'nord lead 4', 'nord micro modular', 'nord modular', 'nord rack', 'nord stage', 
'nord wave', 'novation', 'numark', 'oberheim', 'octatrack', 'orthogonal devices', 'paratek', 'pearl', 'peavey', 'pioneer dj', 'pittsburgh', 'pittsburgh modular', 
'polyend', 'polygraf', 'ppg (palm products gmbh)', 'prs', 'qu bit', 'qu-bit', 'qu-bit electronix', 'quasimidi', 'qubit', 'quiklok', 'radikal technologies', 
'rhodes', 'rickenbacker', 'roland', 'roli', 'sanson', 'schecter', 'sensel', 'sequencial', 'sequential', 'sequential', 'sequential circuits', 
'sequential circuits', 'sequentix', 'shakmat', 'simmons', 'soma', 'sonicware', 'special waves', 'spector', 'spectral audio', 'sputnik', 'squarp instruments', 
'squier', 'ssff', 'stanton', 'steinberger', 'sterling', 'strymon', 'studio electronics', 'studiologic', 'studiologic music', 'synamodec', 'synthesis technology', 
'synthrotek', 'synthstrom', 'synthstrom', 'synthtech','swissonic', 'tascam', 'taylor', 'technos', 'teenage', 'teenage engineering', 'tiptop', 'tiptop audio', 
'traveler guitar', 'udo audio', 'uno synth ', 'vermona', 'vermona', 'virus', 'viscount', 'volca', 'vox', 'waldorf', 'warwick', 'washburn', 'waves grendel', 
'wersi', 'wersi music', 'winter modular', 'wmd', 'wmd / ssf', 'wurlitzer', 'yamaha', 'yocto', 'zeppelin design labs', 'zoom','1010 music', '2hp', '4ms', 'acid rain technology', 
'acl', 'addac system', 'after later audio', 'aion modular', 'ajh synth', 
'alm busy circuits', 'alright devices', 'analogue solutions', 'bastl instruments', 'befaco', 'blackhole cases', 'blue lantern', 'boredbrain music', 
'bubblesound', 'buchla', 'cosmotronic', 'cre8audio', 'divkid', 'dnipro modular', 'doepfer', 'dreadbox', 'e-rm', 'electrosmith', 'emblematic systems', 
'empress effects', 'endorphin.es', 'eowave', 'erica synths', 'erogenous tones', 'eskatonic modular', 'eventide', 'five12', 'frap tools', 'future sound systems', 
'gieskes', 'grayscale', 'hexinverter', 'industrial music electronics', 'instruo', 'intellijel designs', 'io instruments', 'jomox', 'joranalogue', 'klavis', 
'koma elektronik', 'l-1', 'lmntl', 'low-gain electronics', 'lzx industries', 'make noise', 'malekko heavy industry', 'manhattan analog', 'meng qi', 
'michigan synth works', 'modbap modular', 'moog', 'mordax', 'mosaic', 'mrseri', 'mutable instruments', 'nano modules', 'noise engineering', 
'patching panda', 'percussa', 'pittsburgh modular', 'plankton electronics', 'poly effects', 'qu-bit electronix', 'random source', 'ritual electronics', 
'rossum', 'schlappi engineering', 'shakmat modular', 'soundforce', 'soundmachines', 'squarp', 'steady state fate', 'strymon', 'studio electronics', 
'supercritical', 'synthesis technology', 'system 80', 'tall dog electronics', 'tasty chips', 'tenderfoot electronics', 'tesseract modular', 'tiptop audio', 
'trogotronic', 'tubbutec', 'u-he', 'verbos electronics', 'vermona', 'voicas', 'vpme.de', 'winter modular', 'wmd', 'worng electronics', 'xaoc devices', 
'xor electronics', 'zlob modular',"ASM","Elektron","Moog","Sequential","Teenage Engineering","SOMA Laboratory","Korg","Novation","Modal Electronics",
"Black Corporation","Roland","Arturia","Critter & Guitari","Polyend","UDO","Waldorf","Nord","Yamaha","Vermona","Crumar","JMT Synth","Modor",
"Studio Electronics","Trogotronic","Gieskes","Akai","Dreadbox","Herbs and Stones","IK Multimedia","Tasty Chips","Buchla","Soundmachines",
"Access","Grp","Analogue Solutions","The Division Department","Norand","Jomox","Sonicware","Radikal Technologies","Playtime Engineering",
"1010 Music","Fred's Lab","Kilpatrick Audio","Eowave","Electrosmith","Meng Qi","Studiologic","Suzuki","Nonlinear Labs","Dato","Artiphon",
"Malekko Heavy Industry","Kodamo","Hikari Instruments","Manikin Electronic","Second Sound","","Arturia","Squarp","Polyend","Novation","Akai",
"Roger Linn Design","Conductive Labs","Native Instruments","Faderfox","Sensel","Roland","Keith McMillen","Pioneer","E-RM","Expressive E","Korg",
"M-Audio","Alesis","JouÃ©","Soundforce","Yamaha","Genki","Erica Synths","SOMA Laboratory","Make Noise","Doepfer","Elektron","Moog",
"Teenage Engineering","1010 Music","Expert Sleepers","BASTL Instruments","Kenton","Circuit Happy","MOTU","MIDI Solutions","Solid State Logic",
"Nord","Malekko Heavy Industry","Koma Elektronik","Random Source","Eowave","Zoom","Crumar","Electro-Harmonix","Grp","Michigan Synth Works",
"Analogue Solutions","Knas","iConnectivity","Soundmachines","Eurodesk-Z","Presonus","Torso Electronics","IK Multimedia","ESI Audiotechnik",
"Low-Gain Electronics","Kilpatrick Audio","Artiphon","Instruments of Things","Apogee","SND","Future Retro","Moffenzeef","CME","Embodme",
"Tech 21","Snyderphonics","Tricks Magic Shop","","","Strymon","Vermona","OTO Machines","Dreadbox","Chase Bliss Audio","Boss","GFI","Meris",
"Eventide","SOMA Laboratory","Echo Fix","Fairfield Circuitry","Universal Audio","Gamechanger Audio","EarthQuaker Devices","Death By Audio",
"Sherman","Electro-Harmonix","Old Blood Noise Endeavors","Knas","Red Panda","Malekko Heavy Industry","Kemper","DigiTech","JAM Pedals",
"Erica Synths","Elektron","WMD","1010 Music","Roland","Korg","Poly Effects","Jomox","Thermionic Culture","Decksaver","Warm Audio",
"Zoom","Boredbrain Music","Meng Qi","Electrosmith","Benidub","BAE","Trogotronic","MIDI Solutions","Plankton Electronics","Vongon",
"ART","Hungry Robot","Walrus Audio","Enjoy Electronics","CIOKS","TK Audio","Source Audio","API","Kilpatrick Audio","Voodoo Lab",
"FMR Audio","JHS Pedals","MOD Devices","Cooper FX","Finegear","Ezhi & Aka","Truetone","LastGasp Art Laboratories","Origin Effects",
"Rainger FX","Line 6","PedalTrain","Dr. Scientist","Elta Music","Keeley","Recovery","Glou-Glou","Retro Mechanical Labs","Electro-Faustus",
"Animal Factory","Hologram","Caroline Guitar Company","MXR","Second Sound","Xotic","Dunlop","Adventure Audio","ISP Technologies",
"Industrialectric","Tech 21","Collision Devices","Orgeldream","Universal Audio","API","Solid State Logic","Rupert Neve Designs","Shure","MOTU",
"Warm Audio","Focusrite","Vermona","Focal","Neumann","Roland","Thermionic Culture","Arturia","Zoom","Presonus","Adam","ART","Yamaha",
"TASCAM","Furman","Antelope Audio","Dangerous Music","Pioneer","Echo Fix","Native Instruments","Decksaver","Eventide","Allen & Heath",
"Meris","Sherman","dbx","BAE","Maag Audio","Empirical Labs","Avantone Pro","iConnectivity","Mackie","Audient","beyerdynamic","TK Audio",
"IK Multimedia","Black Lion Audio","RME","Keith McMillen","Golden Age Project","Audio-Technica","Fredenstein","A-Designs","Rosson Audio",
"Daking","Looptrotter","Rode","Prism Sound","Samson","Cranborne Audio","ESI Audiotechnik","Elysia","HEDD","FMR Audio","Heritage Audio",
"Avedis Audio","Sennheiser","Lindell Audio","Blue Microphones","Apogee","Recovery","M-Audio","Zeppelin Design Labs","KRK","AKG",
"Cloud Microphones","Steinberg","Alesis","Dynaudio","Austrian Audio","Auralex","IsoAcoustics","Aston Microphones","Auratone","sE Electronics","SE Electronics",
"Tech 21","Lauten Audio","Cascade Microphones","Soundrise","Pioneer","Allen & Heath","Pro-Ject","PLAYdifferently","U-Turn Audio","Audio-Technica",
"Thorens","Audioengine","Technics","Rane","AKG","Music Hall","Native Instruments","Numark","Sennheiser","Jesse Dean Designs","Ortofon","Decksaver",
"Rosson Audio","MWM","Gator","IK Multimedia","ART","Yamaha","Ultimate Support","RME","Roland","KRK","Austrian Audio","Shure","Odyssey",
"Teenage Engineering","Denon","Record Props","Presonus","Hosa","Hosa","Mogami ","Roland ","Voodoo Lab ","CIOKS ","LMNTL ","Warm Audio "
"Teenage Engineering ","myVolts ","Gator ","Truetone ","Strymon ","Eurodesk-Z","Furman","Elektron","Tiptop Audio","Retrokits","4MS","EBS",
"Pomona Electronics","Modbang","Intellijel Designs","Plankton Electronics","Radial Engineering","1010 Music","Native Instruments",
"Expert Sleepers","Buchla","iConnectivity","Modbap Modular","Boredbrain Music","Make Noise","Korg","Moog ","Rode ","Shure ",
"LabLab Audio ","Zoom ","Doepfer ","Koma Elektronik ","ADDAC System ","Frap Tools ","Endorphin.es ","ART s","Yamaha ","Walrus Audio",
"ALM Busy Circuits ","Analogue Solutions ","Trogotronic ","Befaco ","Boss ","Soundmachines ","LZX Industries ","Cyclone Analogic ",
"M-Audio ","E-RM ","Pulp Logic ","Electro-Harmonix ","ESI Audiotechnik ","Eskatonic Modular ","Eventide ","Instruo ","Keith McMillen",
"Malekko Heavy Industry ","Dunlop"]

In [21]:
# removing whole names in list.

lista_criba = []

for marca in sintes:
    marca = marca.lower()
    if marca not in lista_criba:
        lista_criba.append(marca)

In [22]:
lista_criba[2:10]

['000', '4ms', 'a-v-p synth', 'acces', 'access', 'acidlab', 'akai', 'alembic']

- Once I clean the list of possible repeated names, what I do next is that the names composed of two terms are a list of two elements.

In [23]:
# split double names in list

lista_sintes= []

for marca in lista_criba:
    marca = marca.lower()
    if marca not in lista_sintes:
        marca=marca.split()
        lista_sintes.append(marca)

In [24]:
lista_sintes[2:10]

[['000'],
 ['4ms'],
 ['a-v-p', 'synth'],
 ['acces'],
 ['access'],
 ['acidlab'],
 ['akai'],
 ['alembic']]

## 3.6 How to identify manufacturer brands?.




<br>

There are many names with two or three terms and some of them are unique, but others have the first name in common.

for example:

<br>

    ...analogue systems, analogue solutions...
    
<br>

To solve this problem we have to do:


- Building a manufacturers' dictionary.

- Implement an algorithm that differentiates between multiple manufacturer brands


As an example we will use this small dictionary as if it were our dictionary of synthesizer manufacturers:

  - Manufacturers' dictionary:


    sint3 = {"analogue":["solutions","systems"]} 
    
 

The implementation of the algorithm will be based on detecting:

- *single names*: `Roland`

- *double and unique names*:  `Dave Smith`

- *double names with a first name common to different manufacturers*: `Analogue Systems or Analogue Solutions`





<br>

The particularity of **this** example is that it allows us to see one of the most controversial cases when it comes to the extraction of a name. 

### example of algorithm implementation:

With this example what is intended is just to detect the correct name in the variable of the description, ie:

- `Analogue Systems`


In [25]:
# How to detect manufacturer brands with same name and different 'surname'


texto_descript = ''

# Our manufaturer brand in a dictionary!
sint3 = {"analogue":["solutions","systems"]}  

# Example description ad
descrip = ['analogue', 'systems', 'woggeblug', '+', 'morphagene', '+', 'optomix'] 

texto_marca_compuesta =''
kompare = ''

list_temp = []
list_brand = []                                     # steps

for idx in descrip:                                 # 1. #7. 
    if idx in kompare:                              # 2. #8.
                                                    
        list_temp.append(idx)                            # 9. 
        for palabritas in list_temp:                 
            texto_descript += ' '+ palabritas            # 10.

        #list_brand.append(texto_descript)               # 11.
        kompare = ''                                     # 12.
        list_temp.clear()                                # 13.

        

    elif idx in sint3:                              # 3.
        if len(sint3[idx]) > 1:                     # 4.
            list_temp.append(idx)                   # 5.
            print(sint3[idx])
            kompare = sint3[idx]                    # 6.

        
print(texto_descript)

['solutions', 'systems']
 analogue systems


- We have the correct name: `analogue systems` thru the dictionary where they share a "name".

In [26]:
# 1. for starts                   # 7. the idx already has the second name of the synth.

# 2. if idx coincides with the content of a kompare, else go to next elif.

                                # 8. if comparing the idx with the variable that contains the match if true
                                # 9. Saving the second match in the temporary list
                                # 10. I convert the items in the list to a text string


                                # 11  Assign the string into the brand list
                                # 12. clean the value of the kompare variable and texto_marca_compuesta
                                # 13. clean the contents of the temporary list

        

# 3. if the index is contained in the sint3 dictionary 
# 4. and the length of the dictionary key value is greater than 1 means we have at least two possible marks.
# 5. I store this first match in a temporary list

# 6. And save the list of possible matches in a variable, which was initially void with ''



in the previous example we made use of a small dictionary that we could make by hand, but we need to implement a dictionary with all manufacturers.


## Building the manufacturer's dictionary

In [27]:
marcas_nombres = []


def sint_word(sintex):
    marcas_nombres.append(sintex)
    return [marcas_nombres[-1]]

def sint_more_word_rep(sintex):
    # no es necesario que devueva una lista ya que dentro del diccionario ya la tengo!
    marcas_nombres.append(sintex)
    return marcas_nombres[-1]


dict_funct = {"sint_word":sint_word,
            "sint_more_word_rep":sint_more_word_rep
}

dict_marca = {}

tag_mark = ''
for marcas in lista_sintes:
    if len(marcas) == 1:
        if marcas[0] not in dict_marca:
            tag_mark = 'sint_word'
            brand = marcas[0]
            ret = dict_funct[tag_mark](brand)
            
            dict_marca[brand] = ret
            #print("x")
    elif len(marcas) > 1:                           # aqui la marca tiene este formato: ['0', 'coast']
        if marcas[0] not in dict_marca:
            tag_mark = 'sint_word'
            #print(marcas[0])
            #print(marcas[1])

           
            ret = dict_funct[tag_mark](marcas[1])
            dict_marca[marcas[0]] = ret

        elif marcas[0] in dict_marca:
            tag_mark = 'sint_more_word_rep'
            ret = dict_funct[tag_mark](marcas[1])
            dict_marca[marcas[0]].append(ret)
            #print("x")

In [28]:
print(dict_marca)

{'0': ['coast'], '0-coast': ['0-coast'], '000': ['000'], '4ms': ['4ms'], 'a-v-p': ['synth'], 'acces': ['acces'], 'access': ['access'], 'acidlab': ['acidlab'], 'akai': ['akai'], 'alembic': ['alembic'], 'alesis': ['alesis'], 'allen': ['&'], 'allen&heath': ['allen&heath'], 'analogaudio1': ['analogaudio1'], 'analogue': ['solutions', 'systems', 'solutions'], 'arp': ['arp'], 'arturia': ['arturia'], 'asm': ['asm', '(ashun'], 'atomo': ['synth'], 'atomosynth': ['atomosynth'], 'audio': ['damage'], 'audiophile': ['circuits'], 'axoloty': ['axoloty'], 'balaguer': ['balaguer'], 'baloran': ['baloran'], 'bastl': ['instruments'], 'befaco': ['befaco'], 'behringer': ['behringer'], 'bheringer': ['bheringer'], 'bitbox': ['bitbox'], 'black': ['corporation', 'lion'], 'boss': ['boss'], 'bubblesound': ['instruments'], 'buchla': ['buchla'], 'bã¶hm': ['bã¶hm'], 'casio': ['casio'], 'charlie': ['lab'], 'charvel': ['charvel'], 'chronograf': ['chronograf'], 'circuit': ['abbey', 'happy'], 'clavia': ['clavia'], 'club'

It will give us:

<br>

- What was inside of `sint3[idx]` ['solutions', 'systems'] thanks to a print.


- Finally we have the we have the right name of the description, as we expected.

<br>

Once we understand the operation we are going to implement the necessary code.


## 3.7 Detecting manufacturer brands.



In [29]:
compare = ''                    #variable where the middle name is saved
marca_del_sinte = ''            # empty variable for store synth brand 
texto_descriptivo = ''          #ad descriptive text
list_temp = []                  #temporary list to detect the middle name 

                                # buy, sell, change... lists.
list_compro = []
list_cambio = []
list_vendo = []
list_regalo = []
list_busco = []
list_rebaja = []
list_reparar = []
list_piezas = []
list_urgente = []
list_oferta = []

list_brand = []                 # manufacturers synth brand
list_descripcion = []           # final ad description on dataframe output 
texto_descriptivo_salida = []   # esto es lo que se vera como contenido del anuncio

list_price = []                 # price
list_user = []                  # user
list_city = []                  # city
list_published = []             # date published
list_expire = []                # data expire ad
list_times_seen= []             # times seen ad

lista_palabras_para_eliminar = [] # remove words from description list

def func_compro(clave_func_dict): 

    if list_compro[-1] == "0":
        list_vendo.pop(-1)
        list_vendo.append("0")
        list_compro.pop(-1)
        list_compro.append("1")
    else:
        pass


def func_cambio(clave_func_dict):
    if list_cambio[-1] == "0":
        list_vendo.pop(-1)
        list_vendo.append("0")
        list_cambio.pop(-1)
        list_cambio.append("1")

    else:
        pass

def func_vendo(clave_func_dict):
    if list_vendo[-1] == "0":
        list_vendo.pop(-1)
        list_vendo.append("1")
    else:
        pass

def func_regalo(clave_func_dict): 
    if list_regalo[-1] == "0":
        list_vendo.pop(-1)
        list_vendo.append("0")
        list_regalo.pop(-1)
        list_regalo.append("1")
    else:
        pass

def func_busco(clave_func_dict):  # if looking for, then is not a sell...
    if list_busco[-1] == "0":
        list_vendo.pop(-1)
        list_vendo.append("0")
        list_busco.pop(-1)
        list_busco.append("1")
    else:
        pass

def func_reparar(clave_func_dict):
    if list_busco[-1] == "0":
        list_reparar.pop(-1)
        list_reparar.append("1")
    else:
        pass

def func_piezas(clave_func_dict):
    if list_busco[-1] == "0":
        list_piezas.pop(-1)
        list_piezas.append("1")
    else:
        pass

def func_rebaja(clave_func_dict):
    if list_rebaja[-1] == "0":
        list_vendo.pop(-1)
        list_vendo.append("0")
        list_rebaja.pop(-1)
        list_rebaja.append("1")
    else:
        pass

def func_oferta(clave_func_dict):
    if list_oferta[-1] == "0":
        list_oferta.pop(-1)
        list_oferta.append("1")
    else:
        pass



func_dict = {                     # function dictionary
    "compro":func_compro,
    "cambio":func_cambio,
    "vendo":func_vendo,
    "vende":func_vendo,
    "regalo":func_regalo,
    "busco":func_busco,
    "busca":func_busco,
    "reparar":func_reparar,
    "piezas":func_piezas,
    "rebajado":func_rebaja,
    "rebaja":func_rebaja,
    "oferta":func_oferta
    
}

def remove_compro(clave_func_dict):
    #list_compro.append(clave_func_dict
    list_compro.remove(clave_func_dict)


#rmv_func = {"compro":remove_compro}


def urgente():                     # if some "accion" word is repeated on description, means urgency
    list_urgente.remove('0')
    list_urgente.append("1")


def eliminar_signos(txt):          # cleaning text
    description = txt.replace(":"," ")
    descripcion = description.replace(";"," ")
    descripcion_1 = descripcion.replace("("," ")
    descripcion_2 = descripcion_1.replace(")"," ")
    descripcion_3 = descripcion_2.replace("/"," ")
    descripcion_4 = descripcion_3.replace("."," ").lower()
    descripcion_5 = descripcion_4.split()
    return descripcion_5


def default_atributes():         # default actions, means all is selling, if not then function will be called.
    list_cambio.append("0")
    list_compro.append("0")
    list_urgente.append("0")
    list_vendo.append("1")
    list_regalo.append("0")
    list_reparar.append("0")
    list_piezas.append("0")
    list_busco.append("0")
    list_brand.append("-")



### Inicio


for pagina_anuncio in os.listdir('.'):       # '.' current folder
    #if "kurzweil" in pagina_anuncio:                               #compro-cambio
    with open(pagina_anuncio, 'r') as pagina_bruto:

        pagina_analizar = pagina_bruto.read()
        soup = BeautifulSoup(pagina_analizar, 'html.parser')


        node = soup.find('h1') 

    if  node is not None:                       # avoiding skipping an error related to None
        descripcion = node.text 
        descripcion = eliminar_signos(descripcion)
        #print(descripcion)

        default_atributes()


        for word_1 in descripcion:
            if word_1 in accion:
                func_dict[word_1](word_1)
                lista_palabras_para_eliminar.append(word_1)

            elif word_1 in compare:
                list_temp.append(word_1)

                for marca_sinte in list_temp:
                    marca_del_sinte += marca_sinte + ' '
                    lista_palabras_para_eliminar.append(marca_sinte) # delete the compound name of the sinte in the description.

                #marca_del_sinte = ''
                list_brand.pop(-1)
                list_brand.append(marca_del_sinte)

                compare = '' # variable with the second part of the name, so I clean it here.

            elif word_1 in dict_marca:
                size_brand = len(dict_marca[word_1])

                if ((size_brand == 1) and (list_brand != "-")) :
                    list_brand.pop(-1)
                    list_brand.append(word_1)


                elif ((size_brand == 1) and (list_brand == "-")) :
                    list_descripcion.append(word_1)

                elif size_brand > 1:
                    compare = dict_marca[word_1]
                    list_temp.append(word_1)

                elif list_brand != "-":
                    list_descripcion.append(word_1)

            marca_del_sinte = ''

        list_temp.clear()

        duplicates = [element for element in lista_palabras_para_eliminar if lista_palabras_para_eliminar.count(element) > 1]
        unique_duplicates = list(set(duplicates))

        size_unique_duplicates = len(duplicates)

        if size_unique_duplicates > 1:
            urgente()


        for eliminar in lista_palabras_para_eliminar:
            try:
                descripcion.remove(eliminar)
            except:
                pass

        for palabras in descripcion:
            texto_descriptivo += palabras + ' '


        texto_descriptivo_salida.append(texto_descriptivo)

        texto_descriptivo =''


        # --- price

        try:
            price = soup.find('div',class_='ad-price').text
            price = int(price.split()[0])
            list_price.append(price)
        except:
            price = 0
            list_price.append(price)
            

        # --- user name

        user = soup.find('div',class_='col-lg-7').a.text
        list_user.append(user)
        

        # --- city

        city = soup.find('div',class_='col-lg-7').div.strong.text
        list_city.append(city)

        # --- published
        
        publish= ' '

        published = soup.find('div',class_='col-lg-7').div.text.split()[-5:-2]
        
        for indx in published:
            if '/' in indx:
                #print(indx)
                publish = indx
            elif 'hace' in indx:
                a = published.index(indx)
                #print(published[a+1] + ' ' + published[a+2])
                publish = published[a+1] + ' ' + published[a+2]

            
        list_published.append(publish)

        # --- expire 

        expire = soup.find('div',class_="expira").text.split()[1]
        list_expire.append(expire)

        # --- times seen
        
        seen = soup.find('div',class_="expira").text.split()[4]
        list_times_seen.append(seen)

        lista_palabras_para_eliminar.clear()
        #print(pagina_anuncio)



In [30]:
! pwd

/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls


## 3.8 Date extraction

The next step is to know what is the extraction date. This is an important fact since because it will serve as a reference to know how long means 3 days, 1 week, 5 hours since the records were made.

In [31]:
hoy = dt.datetime.now()
year=str(hoy.year)

month=str(hoy.month)
day=str(hoy.day)

date_scrapped = day + '/' + month + '/' + year

# ex: date_scrapped = '26' + '/' + '08' + '/' + '2022' 

date_scrapped

'4/6/2023'

- Dataframe created in `df` variable.

In [32]:
df = pd.DataFrame({'urgent':list_urgente,
                   'buy':list_compro,
                   'change':list_cambio,
                   'sell':list_vendo,
                   'price':list_price,
                   'gift':list_regalo,
                   'search':list_busco,
                   'repair':list_reparar,
                   'parts':list_piezas,
                   'synt_brand':list_brand,
                   'description':texto_descriptivo_salida,
                   'user':list_user,
                   'city':list_city,
                   'published':list_published,
                   'expire':list_expire,
                   'date_scrapped':date_scrapped,
                   'seen':list_times_seen
                  },index = list(range(1,len(texto_descriptivo_salida)+1)))

## 3.9 Clean the column of the publication dates.


As we can see sometimes the format is correct and sometimes indicates moments related to the date we are on. so it has to be corrected.

The solution is to create a function that reads that format and converts it to the correct date and format.


For this we have to implement all the cases that can be given.

In [33]:
semanas = ['1 semana', '2 semanas', '3 semanas', '4 semanas']
dias = ['1 día', '2 días', '3 días', '4 días', '5 días', '6 días', '7 días']
horas = ['1 hora','2 horas', '3 horas', '4 horas', '5 horas', '6 horas',
        '7 horas','8 horas', '9 horas', '10 horas', '11 horas', '12 horas',
        '13 horas', '14 horas','15 horas', '16 horas', '17 horas', '18 horas',
        '19 horas', '20 horas', '21 horas', '22 horas','23 horas', '24 horas']

In [34]:
minutes=[]
for mint in range(1,61):
    if mint < 2:
        texto = str(mint) + ' minuto'
        minutes.append(texto)
    else:
        texto = str(mint) + ' minutos'
        minutes.append(texto)

- `nice_format` is the function that is responsible for identifying the time intervals that the web gives us and making a time conversion.

In [35]:
def nice_format(parameter):

    days_inweek = 7

    hoy = dt.datetime.now()
    year=str(hoy.year)

    month=str(hoy.month)
    day=str(hoy.day)

    date_scrapped = day + '/' + month + '/' + year
    
    current_datetime = dt.datetime.strptime(date_scrapped,"%d/%m/%Y") 
    
    
    if parameter in semanas:
        
        num_semana = parameter.split()
        num_semana = int(num_semana[0])
        cambio_semana = semanas[num_semana-1]
        
        dias_semana = (num_semana * days_inweek)
        
        fecha_real_semana = current_datetime - dt.timedelta(dias_semana)
        
        fecha_real_semana = fecha_real_semana.strftime("%d/%m/%Y")
                
        df['published'] = df['published'].replace( to_replace = cambio_semana, value = fecha_real_semana) #+ ' semana'
        
        
    elif parameter in dias:
        num_dia = parameter.split()
        num_dias = int(num_dia[0])
        cambio_dia = dias[num_dias-1]

        fecha_real_dia = current_datetime - dt.timedelta(num_dias)
        fecha_real_dia = fecha_real_dia.strftime("%d/%m/%Y")
        
        df['published'] = df['published'].replace( to_replace = cambio_dia, value = fecha_real_dia) #+ ' semana'
        
        
    elif parameter in horas:
        num_hora = parameter.split()
        num_hora = int(num_hora[0])
        
        if (parameter != '24 horas'):
            hora_real = current_datetime
            hora_real = hora_real.strftime("%d/%m/%Y")
            
            df['published'] = df['published'].replace(to_replace = parameter,
                                              value = hora_real)
        
        elif parameter == '24 horas':
            horas_24 = 1
            hora_real = current_datetime - dt.timedelta(horas_24)
            hora_real = hora_real.strftime("%d/%m/%Y")
            
            df['published'] = df['published'].replace( to_replace = parameter,value = hora_real ) #+ ' semana'
    
    
    elif parameter in minutes:
        horas_24 = 1
        hora_real = current_datetime - dt.timedelta(horas_24)
        hora_real = hora_real.strftime("%d/%m/%Y")
            
        df['published'] = df['published'].replace( to_replace = parameter,value = hora_real ) #+ ' semana'

In [36]:
df['published'].apply(nice_format)
print('')




### 4. Anonimizing user names

Although users in the forum use aliases, the most sensible thing is to create a function that anonymizes the name of the users

In [37]:
def anonimizer(value):
    variable = 0
    for letras in value:
        variable += int(ord(letras))
    return variable

In [38]:
df['anon_user'] = df['user'].apply(anonimizer)
print('')




In [39]:
df.drop("anon_user",axis ='columns')
print('')




### Last step

We already have all the data inside the dataframe now the only thing left to do is to save the content in a csv file

In [40]:
df.to_csv('df_hispa_4_06_2023.csv', index = True)

In [41]:
dff = pd.read_csv("df_hispa_4_06_2023.csv")
dff

Unnamed: 0.1,Unnamed: 0,urgent,buy,change,sell,price,gift,search,repair,parts,synt_brand,description,user,city,published,expire,date_scrapped,seen,anon_user
0,1,0,0,0,1,0,0,0,0,0,radikal,"sintetizador radikal technologies accelerator,...",rolandhispasonic,Jaén,28/05/2023,01/07/2023,4/6/2023,314,1713
1,2,0,0,0,1,0,0,0,0,0,mpc,akai mpc 61 keys nuevo,Luisrro,Gipuzkoa,28/05/2023,30/06/2023,4/6/2023,202,752
2,3,0,0,0,1,0,0,0,0,0,roland,roland fantom x8,Aguilar242,Navarra,07/05/2023,28/06/2023,4/6/2023,266,861
3,4,0,0,0,1,0,0,0,0,0,roland,roland tr8,trocol,Madrid,03/05/2023,29/06/2023,4/6/2023,240,659
4,5,0,0,0,1,0,0,0,0,0,korg,korg wavestate,Markus,Granada,30/05/2023,29/07/2023,4/6/2023,78,627
5,6,0,0,0,1,0,0,0,0,0,decksaver,akai force + decksaver + opx4,dce,Sevilla,28/05/2023,25/07/2023,4/6/2023,156,300
6,7,0,0,0,1,0,0,0,0,0,roland,roland 1976 analog synth genesis cure camel st...,TheMix,Pontevedra,29/12/2022,28/06/2023,4/6/2023,1243,591
7,8,0,0,0,1,0,0,0,0,0,-,leslie 760 n autónomo,windbass,Zaragoza,16/06/2020,02/07/2023,4/6/2023,2646,859
8,9,0,0,0,1,0,0,0,0,0,octatrack,elektron octatrack mki + 2 tarjetas 64gb y 16gb,Javier m,Córdoba,12/10/2022,29/06/2023,4/6/2023,927,750
9,10,0,0,1,0,0,0,0,0,0,korg,o sintetizador teclado korg tr 88 workstation,txominzar,Gipuzkoa,21/05/2023,02/07/2023,4/6/2023,226,1004


In [42]:
! pwd

/home/ion/Documentos/albertjimrod/personal_proj_hispasonic/htmls
