New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logbook for version 0.1 #1
Comments
The blitter draws the complete picture (Workbench initial screen) with multiple single draw operations as far as I read somewhere. I think you will have to implement at least the blitter logic for it then. That would mean the first goal is already set pretty high ambitious ...... nice goal dirk !! I totaly like this logbook story format ;-) ... |
You are right. Although V0.1 won't do much from a users perspective, a lot under-the-hood stuff needs to work to make the image appear. Right now, I'm trying to find out how the DMA time slot allocation is implemented in SAE (the Javascript UAE clone): It must be a core piece of the emulator, but I didn't find it yet. What I did find is two event tables: eventtab and eventtab2. The first one covers four events:
The second one covers two:
I guess the Disk, Blitter and Audio DMA cycles are implemented within the event handlers. Unfortunately, I didn't find the code fragments that implement Sprite and Bitplane DMA. It is quite difficult to crawl through the UAE or SAE code because there are hardly any comments and the code is far from self-explanatory. |
I attached a crisper image because in your pic the slots for 68k,blitter, copper are hard to distinguish from the 320 mode bitplane DMA. taken from bloodline blog where he explains how he would implement the DMA sequencer (watch out for Theory time) http://eab.abime.net/showthread.php?t=90316&page=9 detailing information about DMA sequence http://amigadev.elowar.com/read/ADCD_2.1/Hardware_Manual_guide/node012B.html |
Hmm, in SAE there is a lot of stuff going on in functions hsync_handler_pre() and hsync_handler_post(). It seems like SAE is not cycle-accurate, but line-accurate 🤔. I'm a little bit confused right now. |
I found something in custom.js the function DMACON(v, hpos) looks maybe like a dispatcher for DMA slots... what is this readmap and writemap thing? it seems like a function map depending sort of depending on hpos? in winuae it is different, there in custom.cpp is a function "static int dma_cycle (void)" which maybe seems to do the dma sequencing ? Or am I wrong... |
DMACON and DMACONR seem to be the read and write handlers for the DMACON register. They only occur in the read and write maps for the OCS registers:
Thanks a lot for the links to the DMA sequencer forum thread. Might be very useful for us! |
I've found this in SAE:
This is called after every HSYNC. It schedules an event that runs dmal_emu in a loop. dmal_emu performs disk or audio DMA, depending on the horizontal position. This means that the CPU cannot interfere because the DMA is emulated in a single chunk. This indicates line-accuracy, but I always thought that UAE is cycle-accurate. I'm wondering how bitplane data is read in... |
More findings: An HSYNC event calls Interestingly,
At least it seems to work this way for sprites. I'm pretty much in favour of emulating DMA access as it is explained in the Raperry Pie Forum thread (VICII in VirtualC64 works the same way). It would be slower, but it seems simpler and less error-prone. |
Raspberry Pie Forum ? I totally agree with you, I also think the way which bloodline is going is my favourite. |
The CIAs are connected to memory now, so we are able to process a few more instructions. We are here at the moment:
The first move instruction configures pins PA0 and PA1 of CIAA as output and the second move instruction sets PA0 to 0 and PA1 to 1. PA0 controls the Kickstart overlay, and writing a 0 means that the Kickstart should no longer be overlayed. When the two instructions are executed, the memory panel shows that this is indeed the case. We now see the Chip Ram blended in: PA1 controls the LED. The LED is switched off, but because the LED was already switched off before, we don’t see anything of interest. The next important instructions are:
The first two statement reset the interrupt registers which can be verified in the Paula panel: I agree that it’s not that spectacular, because 0 is also the initial value on startup. However, a temporary debug message in the console tells me that the registers have indeed been written to:
The next instruction produces
So it’s clear what to implement next: The DMACON register inside Agnus… |
Now I can step through until I reach the Expansion RAM Checker at
Good to have a commented Kickstart around:
Incomplete address decoding 😳. Never heard about it 🤭. If I'm right, UAE handles unmapped memory via the "dummy" bank handlers ... Let's see what they do there ... |
I've found a Verilog reimplementation of Gary here (Amiga FPGA project): https://github.com/rkrajnc/minimig-de1/blob/master/rtl/minimig/Gary.v According to the line
"incomplete address encoding" means that Gary selects the custom registers if the upper three address bits match. I've adapted this and the new memory mapping now looks like this (A500 with 512 KB slow mem, some fast mem and a RTC attached): |
Now the emulator detects correctly if a Chip Ram extension is present (Slow Ram starting at memory bank C0). If memory is found, it is initialised with zeroes 🥳. Next step will be:
Unfortunately the emulator tends to beach ball if the inspector is open while the main window is in the background. This a Mac related problem and due to the fact that I'm not really familiar with handling auxiliary windows in OS X. I might look into this first before I continue here ... |
After removing a stupid bug in Memory::poke32(), Kickstart has decided that the machine has 256 KB of Chip Ram (which is good, because it has, well, 256 KB of Chip Ram). Then, it recognised that the CPU is a 68000 (by ruling out that it is a 68010 etc.). After that a lot of memory init stuff is done (setting up exec jump tables etc.). This all looks all good (as far as I can judge this at the moment), so I'm finally here:
Wow, a historic moment 🤭. I am a bit scared of what will happen. OK, let's be brave and press the Step button again 😬: Woohoo, supervisor flag is cleared 😀. After experiencing this "historic moment", I need a break. Stepping through Kickstart is exhausting ... Just noticed that the "Data" column is wrong in the CPU panel. Need to fix this first ... |
Thats an completely epic moment for all of us 🖖 ! 🤗 That is the documented kickstart exec of markus wandel. Isn’t it? He wrote that comments in February 3, 1989 so thats clearly a historic moment in February 2019. 🙃 From this time on, we are leaving supervisor mode and running in 68K user mode... certain 68k commands like "stop" or "reset" do not work from this moment on... that makes sense because AmigaOS is a multitasking system ... |
I continued my journey through Kickstart. Unfortunately, I wasn't aware of the fact that Markus Wandel "only" documented exec, so at some point, the unavoidable happened: I left exec and entered the undocumented area. This means I'm in outer space now and completely left on my own 👽😬. I kept on stepping and at some point in time, the emulator started to poke values into the Copper registers. So it was about time to work on that. To make a long story short: I don't have a working Copper yet, but I do have a Copper disassembler 😎. Along the way, I've also invented a new software development approach which I'm gonna call "inverse prioritising". It's core idea is to postpone the most import things as much as possible. I'm so proud of this method that I'm considering to publish a book about it. The only thing that puzzles me is that nobody else had this brilliant idea before 🤔. Anyways, at the moment, the Copper disassembler looks like this: While working on the disassembler, I was looking for some standard Copper assembly notation, but I didn't find any. I therefore invented my own. If it turns out that there is some kind of established notation (which I am not aware of, because I spent so much time on the C64 that the Amiga is brand new technology for me), I can easily change that. |
Although I’m still in the design phase, I have done considerable progress: Important design decision are going to emerge. The first major decision was to move from a mixed event/polling-based design to a truly event-based design. A major part of the emulator is the DMA controller. The heart of the DMA controller is the event scheduler which consists of several event slots. From a theoretical point of view, each event slot is a single state machine with timed transitions. Right now, there are 5 slots (meaning we have 5 state machines running in parallel):
To give an example, let’s look at slot 3. In each HSYNC event, a slot-3-event is scheduled that triggers at the first horizontal beam position where DMA happens. If Disk DMA in enabled, this will be position 7. Once the event is served, the next DMA event is scheduled. Although this sounds simple to implement, it is not. The challenge here is to find out when the next DMA event happens for a given hpos. This is dependent on a lot of factors (DMA enable bits, lores / hires mode, vblank area etc.). To implement this efficiently, I decided to use a precomputed DMA event table. Whenever one of the influencing factors changes (e.g., the vblank area is entered), a DMA time slot allocation table is computed which resembles Fig. 6.9. in the Hardware Reference Manual. Let’s test this out with the current prototype. If we enable Disk DMA, Sprite DMA, Audio DMA for channel 1 and 2 and bitplane DMA in the DMA inspector panel, the event table looks like this (Denise has three bitplanes enables and runs in lowres modes):
To speed things up, the emulator computes a jump table in addition to the event table. For each hpos value, the jump tables indicates the hpos where the next event happens. E.g., at position 0x33, the jump table contains the value 0x3B which is a L2 (lowres bitplane 2 fetch event). I hope I haven't overseen any theoretical flaws in this design. |
Now I’m at a point where the Copper list has been initialized. To examine the list, open the Copper inspector panel: The first list makes sense to me. The Copper is programmed to restore the initial values of the bitplane pointer and the sprite pointer registers. After that, it waits until a certain beam position is reached and jumps to the second Copper list. The second list seems to be uninitialised yet, because the commands make no sense. Commands in red indicate an illegal command. Illegal commands are MOVE commands accessing custom registers Copper has no access to. It’ll be interesting to see how Copper executes these commands. To see it, I would have to step to the point where Copper DMA is switched on. Unfortunately, I cannot step there yet, because the emulator stops at FCADBA with an error message:
The Kickstart code writes into one of the Blitter registers, so it seems the right time to start working on this component ... |
I’ve read through the Hardware Reference Manual. Bottom line is that the Blitter is easy from a functional point of view, but difficult if exact timing is taken into account. As stated in the HRM, we have to deal with varying time slice patterns (depending on the enabled DMA channels): I have reviewed a couple of existing Blitter implementations (with UAE the most cryptic one again) and I came to the conclusion that I want to try something new here. I’m going to control my virtual Blitter via emulated micro instructions. More precisely: When the BLTSIZE register is written to (which starts a blit), the emulator will analyse the current DMA configuration and set up a micro instruction list. After that, the event scheduler will be programmed to trigger Blitter events and each event will then execute a single micro instruction. Here is an example instruction list for the first Blitter configuration in Table 6-2.
The micro instructions allow me to emulate the data flow in the real Blitter quite accurately (The Blitter is designed in form of a traditional pipeline with “hold” register forming an intermediate pipeline stage). Although this approach sounds promising to me (because of it’s flexibility), I’m totally unsure if this is the right way to go. Time will tell… |
Now, I'm at
This writes a value into BLTSIZE which means that a Blitter operation is about to come 😬. Let's step over it ...
Kickstart is starting the Blitter with the BLTCON = 0 config in Table 6-2. OK, there is no micro code for that config yet... I thought it's stupid to use this configuration and now it's the first one being used 🙈. |
OK, now the Blitter has some microcode for its most meaningless mode. I also tweaked the debug output a little, so it's easier to see what's going on internally:
Woohoo, for the first time there is pending message in the Blitter slot 🥳. But wait ... it is overdue since 7 cycles ... this should never happen 😖. |
A brief update about what’s going on in the OCS family. Finally Paula got her own interrupt scheduler. It's a rather sophisticated device that allows her to trigger interrupts in certain cycles, e.g. in five cycles from now on, with little computational overhead. She hasn't really used it so far because there were simply no IRQ requests. I told her to be patient a little while longer as this is going to change soon. Denise became jealous because she is the component with the fewest lines of code yet. Because her sister got this super cool interrupt scheduler, she now insists on getting the pixel engine I promised her some time ago. I told her we had to debug Copper first, because a pixel engine without a Copper is pretty useless. Somehow I felt she wasn't really listening. Agnus is quite happy with his event scheduler. He says that planning events is much more fun than polling regularly. Unfortunately, it's still not an easy task for him. He continues to plan events with invalid time stamps and the like, but I am pretty confident that he will improve that over time. He also likes to be the one in charge of the bus. At first he had the idea to exclude his little sisters from the bus and keep all cycles for himself. I tried to convince him that this was not possible. We have to follow the rules, namely the DMA time slot allocation as stated in the hardware reference manual. I'm not sure he really understood what I meant, because he still does strange things from time to time. Besides my struggle with the OCS chips, I continued stepping through kickstart to a point where the real trouble begins 😬: As you can seen, Kickstart has enabled all kinds of DMA now. The rest of the story can be told rather quickly. When Copper saw his DMA flag set, he run off like crazy 🤪, scheduled some weird events and crashed the whole thing 🙈. Well, as I said above: I need to debug Copper first 😖. |
After fixing some bugs, it’s time to give Copper another chance. The fun starts when Copper's DMA flag is set:
Looks good so far ... the first MOVE command has been executed 😎.
That’s the
command. Let’s check what kind of effect that had ...
Good news here, Copper went idle. Now the question is if he's going to wake up exactly at the specified beam position 🤔… Copper: (12,0): COP_FETCH: coppc = 46C copins1 = 8A 😃 Yeah, it continues at (12,0). The next command is the MOVE command writing into the Strobe register. This is going to redirect us to the second Copper list.
The second list consists of a single command. It’s a WAIT statement then never triggers and therefore disables the Copper.
Let's keep our fingers crossed 🤞...
So, let’s check our event list. The Copper slot should be disabled by now:
Pretty cool 🥳. Copper successfully processed his first list. |
Time for a brief update. After providing each custom chip with some basic functionality (Agnus schedules events, Denise does DMA, Paula triggers interrupts), the OCS kids seem to be happy with what they have (expect Denise who is still angry, because she didn’t get a pixel engine yet ). The problem is that the custom chips run in an endless loop now. I expected them to draw the hand & disk picture eventually, but they don't seem to care about what I want 🙁. Because endless loops are hard to debug, I decided to work on some missing stuff with the hope that one of it is the cause for the infinite loop. One of these things is drive identification. Hence, my current goal is to let the internal drives identify themselves correctly as 3.5” DD drives. Fortunately, the identification happens in documented Kickstart land:
This is also the place where we could tell Kickstart we had an HD drive 😎. $AAAAAAAA is the secret passphrase. |
OK, I can now transfer any 32-bit drive identification key over the RDY line serially. This is nice, but pretty useless at the moment. Why? Because Kickstart knows that df0 is always a standard drive and therefore skips the serial transmission step for it:
The mechanism only becomes important, when external drives come into play. Seems like I have to come up with another idea to tackle my infinite loop problem 🤔. |
Some news about the hand & disk screen hunt. My goal is to reach memory location FC570E. This is where the BLTSIZE register is written to with non-trivial values and the emulator is supposed to blow up there (it’s supposed to blow up, because there is no Blitter micro code for non-trivial blits, but that’s another story and has been done purposely). By stepping back manually through the Omega CPU trace log, I was able to identify the following memory location sequence. This is the result:
Between those addresses, a lot of sub routine stuff is going on. The good news is that vAmiga already reaches FE89B0. This is where Bitplane DMA is switched off. Hence, it remains to check where in this sequence vAmiga gets lost. |
I am just doing the same back stepping in omega spotted first blitsize at
which corresponds perfectly to dirks spotted address a short window out of the full instruction log of omega shortly before blitter action follows here
dirk said vAmiga is currently at this address fe89b0: move.w #$100, $dff096.l --> this is the 2462237th omega instruction since start vAmigas CPU still has to process 26129 CPU instructions that is only 1% of all instructions processed so far... fc570e: move.w D0, ($58,A0) <--- first write to blit size, which is omegas 2488366th instruction since start vAmiga has already taken 99% of the route to the hand and disk drawing 😀 ... |
Kickstart v1.2 full instruction trace log until hand drawing code (executed by omega)... e.g. from the very first instruction until instruction 3129644 where the hand and disk image is drawn... |
Here’s the thing. After vAmiga reaches fe89b0, it eventually executes fc0716 (and so does Omega). The first comparison is false, so it does not branch. This means that the jsr (-$13e,A6) is taken (same in Omega). After returning, it jumps to the comparison statement again. In Omega, the comparison is now true, but in vAmiga it’s still false. The second jsr (-$13e,A6) never returns. There is more than one function with offset -$13e:
This bug is a nightmare! |
Wow it is beautiful. The blitter inside Agnus has done this, right? Look at the clean drawn edges of the floppy disk. Look at the colors. That is inspiring... Green, red and blue... Ooh no, the colors are wrong, the OCS Kids used the wrong colors ... and why did they suddenly stop drawing ? Looks like they do quarrel again...? |
The strange colours are my fault. Because Denise just started her drawing lessens, I decided to withhold the original palette from her. For practicing, I gave her four basic pencils only. A black one, a red, a green, and a blue one. However, I need to have a serious word with Copper. In the middle of each frame, he constantly takes away alls her pencils. So mean |
😎 I have to admit that I cheated a little bit. I've shamelessly copied over the line Blitter stuff from the Omega emulator 🙄. The copy Blitter stuff is original vAmiga though. The next step will be to get the texture dimensions right. The emulator is still using the original texture drawing stuff from VirtualC64. |
A brief update: Firstly, the screen buffer size has been changed to 768 x 288. Secondly, the bitplane DMA has been decoupled from the drawing code. There are separate events for bitplane DMA and pixel synthesis now. This makes the design very flexible, although the exact timing is still wrong for sure. I've done a brief comparison of screen geometries: Left is Omega, middle is vAmiga, right is an UAE clone (presumably PAL). Don't get confused with the vAmiga picture. For debugging, the emulator is currently displaying the whole 1024 x 512 GPU texture. The blue area is unused texture area. The orange area contains a debug pattern (yellow and red stripes). This area is writable by the emulator, but hasn't been written to. As you can see, Omega has a smaller lower border which is most likely due to NTSC emulation. The vAmiga geometry (PAL) looks roughly the same as the picture to the right, so I think I am on the right track... |
The first draft of the GPU pipeline architecture has been completed and implemented. Details are here: https://github.com/dirkwhoffmann/vAmiga/wiki/GPU Using the new pipeline, the current output looks like this: I've also managed to port the 2x upscaler from VirtualC64 to vAmiga. Using 2x upscaling, the picture is indeed a lot smoother: I don't plan to support 4x upscaling at the moment (as in VirtualC64), because it would require a very large internal texture size of 4096 x 4096. For the C64, 2048 x 2048 was sufficient. There is still a long way to go to V0.1, because the whole thing is still pretty unstable. |
I just reworked the graphics pipeline (because the 2x upscaler had a bug) and did notice that 4096 x 4096 textures don't seem to be an issue for modern GPUs. Hence, 4x upscaling will be supported. Here is the result: Original Amiga texture: 2x upscaling (EPX algorithm): 4x upscaling (xBr algorithm): |
Hmmm, when enabling all graphics effects (i.e., Gaussian blur), GPU performance on my (not so old) MacBook Pro goes down to 40 fps. Seems like a final texture size of 4096 x 4096 stresses the GPU too much. Maybe it's better to go with a final texture size of 2048 x 2048 (which requires the 4x upscaler to be removed 😢). |
Now as I thought about it a little longer, we can still achieve 4 x upscaling with a 2048 x 2048 texture, at leat in lores mode. In lores mode, each pixel has size 4 x 4, so we can apply an upscaling algorithm "inside" the original texture. In hires mode, we can upscale at least vertically, because the even and odd lines are the same. The only mode that can only be upscaled 2x is hires+interlaced, but this mode is rarely used anyway. |
Back in the game at 60 fps. 2x upscaling, Gaussian blur, Trinitron dot mask + electron beam misalignment 😎: Now, as EPX and xBr both do 2x scaling, they can be compared directly (first = original, second = EPX upscaled, third = xBr upscaled) Hmm, when looking upclose at the xBr image, it looks like there is a bug in the xBr implementation. The line contains strange jagged edges. If this is a bug, it's also contained in VirtualC64 🤔. |
It's a bug. I've just converted a JavaScript xBr implementation to Metal to compare the result. The upper picture shows how it is supposed to look like and the lower picture is the current GPU implementation. I am going to investigate this first, because it also affects VirtualC64. (I cannot simply replace the old implementation, because the JavaScript port is not GPU optimised and thus comparably slow.) |
I've experimented a little with a two-phase upscaling pipeline. The first upscaler works "inside" the emulator texture to enhance lores images. The second upscaler is the already implemented one. The result (EPX in-texture upscaling + xBr) looks pretty promising: The problem here is that picture quality decreases in hires mode, so we have to take care of usability. Maybe, I can enhance the upscaler to detect lores fragments automatically and only apply the first upscaling step to the lowres parts. |
Finally, I think I got it right. vAmiga now has a two-pass upscaling pipeline that can be configured individually in the video preferences: I also figured out that the xBR shader relies on pixel blending to work properly. The results is quite nice I think. Upper picture: Original Amiga output (pipeline setting: none + none) |
Looks brilliant sharp like a vector monitor now. Played once asteroids on a Vectrex. That was as sharp as the results of your two pass xBR pipeline. |
it starts talking to us ... It is already some kind of dialog I think. It says something to us and expects our action in form of a left mouse button click. Surly some kind of intelligent life form... Maybe it wants to test your reaction... |
Yes, it becomes scary now 😬. Fortunately, it only shows this mean reaction when I restore it from a snapshot. Maybe, it doesn't like it's brain to be frozen 🤔. Denise really has to work on her handwriting. The Guru meditation is hardly readable. Opening two emulator instances in parallel seems to work. I therefore conclude that it doesn't have a problem with multiple personalities. |
Oh, just noticed that I don't save the CPU state into the snapshot 🙄. So when it wakes up from hyper sleep, it works with his current brain state on old memories. No wonder he's a little disoriented. Must be like a big hangover for him 🥴. Poor thing 😟. |
Time to release a first beta of vAmiga 0.1 😎: http://www.dirkwhoffmann.de/vAMIGA/vAmiga_0.1b1.zip If you want to check it out, make sure that an original Kickstart is used (1.2 or 1.3). Booting the Aros replacement produces a permanent black screen. vAmiga follows the tradition of UAE 0.1: It runs Kickstart to the point where the hand & disk logo shows up, but because the disk drive is a stub yet, it cannot go further. Nevertheless, many GUI functions already work. They give you a glimpse what the emulator will look like. You can take snapshots and screenshots, emulate keystrokes by typing on a virtual keyboard, or adjust video settings. The CPU, Memory and Event Inspector windows are nearly complete. You can stop the emulator there and single-step through the code, view the memory layout and content, or watch the currently scheduled events (vAmiga is event-based, so there's a lot going on there). Any bug reports are welcome, even tiny things. Finding bugs shouldn't be too difficult at this early development stage 🙈. |
Hi to all, The kickstart hand seems to me more shifted to the left and not centered. |
Hi everyone, |
Please ignore the centering issue for now (I am displaying almost the full texture yet). I'l take care of that once the DIWSTRT and DIWSTOP registers are implemented properly (but to do this, I need to be able to boot the Workbench first). |
I am going to use this thread as a logbook in the near future to document the progress towards version 0.1. V0.1 should have about the same functionality as UAE 0.1. We'll then have a (completely useless) emulator that can do nothing but show the Workbench initial screen. To speak with a picture:
This is the current situation:
We have
So let’s see how far we can get with this. These are the first lines of Kickstart 1.2 which I want to step through:
Before we can get started, we need to install the Kickstart Rom. This is done in the hardware preferences. By default, the Aros replacement Rom is installed.
It can be replaced by an original Rom via drag & drop, so why stick to the clone if we can have the real stuff 😎:
The Kickstart Rom is usually located in the upper memory area. On startup, the Amiga mirrors it in the lower memory banks to enable the CPU to find the correct start vector. The memory inspector shows the details:
When powering on the Amiga, the CPU loads the start vector from the mirrored Kickstart Rom and jumps to address FC00D2. For testing purposes I let the emulator stop at FC00DE at a predefined breakpoint which can be watched in the CPU panel:
Let’s set another breakpoint at FC00FE by double clicking the corresponding line in the program window:
By pressing the Run button the CPU starts and stops at FC00FE.
Pretty nice so far 🥳, but at this point the Kickstart Rom writes into the CIA registers 🙁. Two CIAs are already present in the current implementation, but they are not yet connected to memory. Therefore I have to stop here. I'll continue this thread once the CIAs are connected. Stay tuned ...
The text was updated successfully, but these errors were encountered: